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Abstract 

In this work, we study the problem of testing properties of the spectrum of a mixed quantum 
state. Here one is given n copies of a mixed state p G <Cd.xd jg distinguish (with 

high probability) whether p’s spectrum satishes some property V or whether it is at least e-far 
in £i-distance from satisfying V. This problem was promoted in the survey of Montanaro and 
de Wolf [MdW13] under the name of testing unitarily invariant properties of mixed states. It 
is the natural quantum analogue of the classical problem of testing symmetric properties of 
probability distributions. 

Unlike property testing probability distributions—where one generally hopes for algorithms 
with sample complexity that is sublinear in the domain size—here the hope is for algorithms with 
subquadratic copy complexity in the dimension d. This is because the (frequently rediscovered) 
“empirical Young diagram (EYD) algorithm” [ARS88, KWOl, HM02, CM06] can estimate the 
spectrum of any mixed state up to e-accuracy using only 0{d'^/e^) copies. In this work, we show 
that given a mixed state p € 

• Q{d/e^) copies are necessary and sufficient to test whether p is the maximally mixed 
state, i.e., has spectrum (i,...,i). This can be viewed as the quantum analogue of 
Paninski [Pan08]’s sharp bounds for classical uniformity-testing. 

• 0(r^/e) copies are necessary and sufficient to test with one-sided error whether p has 
rank r, i.e., has at most r nonzero eigenvalues. For two-sided error, a lower bound of 
H(r/e) copies holds. 

• 0(r^) copies are necessary and sufficient to distinguish whether p is maximally mixed on 
an r-dimensional or an (r -I- l)-dimensional subspace. More generally, for r vs. r -|- A (with 
1 < A < r), 0(r^/A) copies are necessary and sufficient. 

• The EYD algorithm requires ^{d'^/e'^) copies to estimate the spectrum of p up to e- 
accuracy, nearly matching the known upper bound. In addition, we simplify part of the 
proof of the 0{d^je^) upper bound. 

Our techniques involve the asymptotic representation theory of the symmetric group; in partic¬ 
ular Kerov’s algebra of polynomial functions on Young diagrams. 
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1 Introduction 


A common scenario in quantum mechanics involves an experimental apparatus which outputs a 
particle whose state is a random variable. For example, in a version of the the famous Stern-Gerlach 
experiment by Phipps and Taylor [PT27], the experimental apparatus produced a hydrogen atom 
whose electron was either in state I+ 5 ) or | — 5 ), each with probability More generally, one can 
describe the output of such an apparatus as falling in an orthonormal set of states I'I'i),..., I'l'd) G 
distributed according to a probability distribution V = {pi, ... ,pd)- Such an object is called 
a mixed state and is often conveniently represented using the density matrix p = Pi ' 

The numbers pi,... ,pd are called the spectrum of p. 

Given such an apparatus, a fundamental task—known as quantum state tomography —is to 
produce an estimate p G ([^dxd. well-approximates p according to some distance measure 

(typically, the trace distance). To do this, one repeatedly runs the apparatus to produce many 
(say, n) independent copies of p and then one processes some measurement of p®"' to produce 
an estimate p. It is known [FGLE12, Footnote 2] that 0{d^\og{d) / e^) copies of p are sufficient to 
output an estimate which is e-close to p in the trace distance. Unfortunately, the quartic dependence 
on d can be prohibitively large, even for quite reasonable values of d; further exacerbating this is 
the fact that many quantum systems are formed as the tensor product of many smaller subsystems, 
in which case d is exponential in the number of subsystems. 

One potential way around this problem is to note that if our actual goal in producing p is to 
determine whether p satisfies some property (e.g., is maximally mixed, has low rank, etc.), then our 
estimate p may be giving us far more information than we need. Thus, we can possibly test whether 
p has the property in question using a much smaller number of copies. This is the motivation behind 
the model of property testing of mixed states, as promoted in the recent survey of Montanaro and 
de Wolf [MdW13]. Formally, we have following definition: 

Definition 1.1. A property of mixed states V is testable with f[d, e) copies if for every d > 2, e > 0 
there is an algorithm T which, when given f{d,e) copies of a mixed state p G behaves as 

follows: 

• If p satisfies V, then Pr[T accepts] >2/3. (“Gompleteness”) 

• If p is e-far in trace distance from all p' satisfying V, then Pr[T rejects] > 2/3. (“Soundness”) 

The choice of probability 2/3 here is essentially arbitrary, and it can be amplified to 1 — <5 at the 
expense of increasing the number of copies by a factor of 0(log(l/(5)). 

As mixed states are the quantum analogue of probability distributions, this model can be seen 
as the quantum analogue of the model of testing properties of probability distributions. We note 
that the problem of testing properties of mixed states has also appeared in the area of quantum 
algorithms. For example, the work of [GHW07] considers Graph Isomorphism algorithms which 
output a mixed state p satisfying a certain property if and only if the input graphs are isomorphic. 

In this work, we focus on the problem of testing so-called unitarily invariant properties. These 
are properties V for which p satisfies V if and only if UpU^ satisfies V for every unitary matrix U . 
It is easy to see that whether a mixed state p has such a property depends only on p’s spectrum 
(hence the name quantum spectrum testing). Many natural properties of mixed states are unitarily 
invariant, such as being the maximally mixed state, having low rank, or having low von Neumann 
entropy. (An example of a natural property which is not unitarily invariant is the property of being 
equal to a fixed mixed state a, so long as a is not the maximally mixed state.) Though it is not 
immediately apparent from the dehnitions (we will show this in Section 2.2), the model of testing 
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properties of mixed states from Definition 1.1 is equivalent to the following definition in the case 
that the property in question is unitarily invariant. 

Definition 1.2. A property of spectra V is testable with f{d,e) copies if for every d > 2,e > 0 
there is an algorithm T which, when given f{d, e) copies of a mixed state p G with spectrum 

r] = {rji,..., Pfi), behaves as follows: 

• If p satisfies V, then Pr[T accepts] >2/3. 

• If r/ is e-far in total variation distance from every p' satisfying V, then Pr[7~ rejects] > 2/3. 

The main gain in using Definition 1.2 over Definition 1.1 is that we only have to reason about 
a total variation distance involving p rather than a trace distance involving p, which is in general 
a more complicated distance measure. We note that the spectrum of a matrix is more properly 
thought of as an unordered multiset of eigenvalues rather than an ordered tuple, and therefore any 
property of spectra V by necessity depends only on the multiset of values {pi, ..., pd} and not on 
their ordering. Hence, quantum spectrum testing corresponds in the classical world to the model of 
testing symmetric properties of probability distributions. As we will soon see. Definition 1.2 allows 
us to show a formal correspondence between these two models. 

1.1 Classical property testing of probability distributions 

The topic of property testing was introduced by Rubinfeld and Sudan in [RS92, RS96] in the context 
of testing algebraic properties of polynomials over finite fields. Since then, it has found applications 
in a wide variety of areas, including testing properties of graphs and of Boolean functions. Over 
the past fifteen years, an extremely successful branch of property testing, first explicitly defined 
in [BFR’''00, BFR+ll], has focused on testing properties of discrete probability distributions. In 
the model of testing properties of probability distributions, there is an unknown distribution D on 
the set {1 ,..., d}, and the tester may draw a random word of length n from D®”; i.e., obtain a 
sequence of n i.i.d. samples from T). Its goal is to decide whether T) has some property V or is e-far 
from V in total variation distance, while minimizing n. 

It is well known [DLOl, pages 10 and 31] (cf. [Dial4, Slide 6]) that after taking n = 0(d/e^) 
samples from D, the empirical distribution is e-close to T) with high probability. As a result, any 
property of probability distributions is testable with a linear (in d) number of samples; thus research 
in this area is directed at finding algorithms of sublinear sample complexity for various properties. 
That such algorithms could exist is suggested by the following Birthday Paradox-based fact: 

Fact 1.3. 0(\/r) samples are necessary and sufficient to distinguish between the cases when the 
distribution is uniform on either r or 2r values. (The bound also holds for r vs. r' when r' > 2r.) 

Setting r = we see that this fact gives a sublinear algorithm for distinguishing between the 
uniform distribution and a distribution that is uniform on exactly half of the elements of {1 ,..., d}. 
This fact is also important as it immediately gives a lower bound of D(\/d) for testing a variety of 
natural problems, those for which Fact 1.3 appears as a special case. 

Perhaps the most basic property of probability distributions one can test for is the property of 
being equal to the uniform distribution, Unif^. A f}{^/d) lower bound follows directly from Fact 1.3. 
On the other hand, a 0{y/d/e^) upper bound was shown in the early work of [BFR'^'OO, BFR'’'13] 
using techniques of [GRll]. The correct sample complexity was hnally pinned down by Paninski 
in [PanOS], who showed matching upper and lower bounds: 

Theorem 1.4 ([PanOS]). Q{Vd/e‘^) samples are necessary and sufficient to test whether V is the 
uniform distribution Unif^. 
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This result was recently extended [VV14] to an 0{Vd/e‘^) upper bound for testing equality to 
any fixed distribution, improving on the previously known [BFF“^01] upper bound of 0{Vdje^). 
More precisely, [VV14] upper-bounds the sample complexity of testing equality to a fixed distribu¬ 
tion V by 0(/(P)/e^), where f{'D) is a certain norm which is maximized when V is the uniform 
distribution. Thus the uniform distribution is the hardest fixed distribution to test equality to. 

The property of being the uniform distribution falls within the class of symmetric properties of 
probability distributions. These are the properties V for which D = (pi,... ^pd) G P if and only 
if (P 7 r(i)) ■ ■ • iP-K{d)) £ ^ for every permutation vr. Other interesting symmetric properties include 
having small entropy or small support size. Testing for small support size does not appear to 
have been precisely addressed in the literature; however the following is easy to derive from known 
results (in particular, the lower bound follows from the work of [VVlla]): 

Theorem 1.5. To test (with e a constant) whether a probability distribution has support size r, 
0{r) samples are sufficient and n(r/log(r)) samples are necessary. 

Let us now relate this section back to the main topic of this paper. As we saw earlier, the spec¬ 
trum of a mixed state can be thought of as a probability distribution on the numbers {1, . .. ,d} 
(indexing the associated eigenvectors); thus any property of mixed state spectra is simply a sym¬ 
metric property of probability distributions. This correspondence allows us to directly compare 
the difficulty of testing properties of mixed state spectra and of probability distributions. In fact, 
the quantum case is always at least as difficult as the classical case; the reason is that the classical 
problem is equivalent to the quantum problem under the promise that the n “samples” provided 
are known orthogonal pure states, |1) ,. .., \d). Alternatively, in Sections 2.3.2 and Section 2.5 we 
will observe the following purely classical characterization of quantum spectrum testing: 

Fact 1.6. Let V be a symmetric property of probability distributions on {1,... ,d}. Testing whether 
the spectrum of a d-dimensional quantum mixed state satisfies V is equivalent to the following 
classical testing problem: Test whether a probability distribution V satisfies V when one is not 
allowed to see the whole random word w ~ 'D®'^, but only the following d statistics: the length of 
the longest k-increasing subsequence of w, for each 1 < k < d. Here a /c-increasing subsequence 
means a disjoint union of k weakly increasing subsequences. 

In light of the above remarks we record the following fact: 

Fact 1.7. LetV be a symmetric property of probability distributions which requires f{d,e) samples 
to test classically. Then testing whether a mixed state’s spectrum satisfies V also requires at least 
f{d, e) copies of the mixed state. 

Although quantum spectrum testing is at least as hard as testing symmetric properties of 
probability distributions, there are some interesting nontrivial properties which have the same 
complexity in both models (up to constant factors). For example, if V is the property of having 
support size 1, then 0(l/e) samples/copies are necessary and sufficient to test V in both models 
(see [MdW13] for the 0(l/e) quantum spectrum testing upper bound). In general, however, it is 
known that spectrum testing can require an asymptotically higher complexity (at least in terms of 
the parameter d). 

We end this section by pointing out that a large portion of the property testing literature 
concerning entropy and support size actually considers the problems of either computing these 
values [Pan04, BDKR05, VVlla, VVllb] (within some tolerance) or distinguishing between the 
cases when these values are either large or small [ValOS] (often these problems have some added 
guarantee on the probability distribution, such as all of its nonzero probabilities being sufficiently 
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large). These problems, strictly speaking, do not fit within the above property testing framework. 
In this work, when we consider the problem of testing a mixed state’s rank (the quantum analogue 
of support size) we will be doing so explicitly within the property testing framework. 

1.2 Related work 

Returning to quantum spectrum testing, we would like to mention two prior lines of research that 
are directly relevant. The first is an algorithm—which we call the empirical Young diagram (EYD) 
algorithm—for learning the spectrum of an unknown mixed state. This algorithm is naturally sug¬ 
gested by the early work of Alicki, Rudnicki, and Sadowski [ARS88] and was explicitly proposed by 
Keyl and Werner [KWOl]. Regarding its performance guarantee, Hayashi and Matsumoto [HM02] 
gave explicit error bounds and a short proof, but their work contained some small calculational 
errors, subsequently corrected by Christandl and Mitchison [CM06]. From the last of these it is 
easy to deduce the following: 

Theorem 1.8. The empirical Young diagram algorithm, when given 0{d?‘le^ •ln(d/e)) copies of a 
mixed state p with spectrum rj, outputs with high probability an estimate of rj that is e-close in total 
variation distanee. 

We will give a description of this algorithm later in the paper; for now, suffice it to say that it 
can be viewed as the quantum version of the natural classical algorithm for learning an unknown 
distribution, viz., outputting the empirical distribution. The EYD algorithm gives a near-quadratic 
improvement over known quantum state tomography algorithms for the problem of estimating a 
mixed state’s spectrum.^ As a result, testing properties of quantum spectra is easy with a quadratic 
number of copies, and so we hope for subquadratie algorithms. 

The second result comes from the work of Childs et al. [CHW07]. It can be thought of as a 
quantum analogue of Fact 1.3: 

Theorem 1.9. 0(r) copies of a state p are necessary and sufficient to distinguish between the eases 
when p’s spectrum is uniform on either r or 2r values. (The bound also holds for r vs. cr when 
c> 2 is an integer.) 

Setting r = |, Theorem 1.9 gives a linear lower bound of D(d) for various properties of spectra. 
This is in contrast with property testing of probability distributions, in which sublinear algorithms 
are the main goal, with the Birthday Paradox typically precluding sub-0(\/d)-sample algorithms. 

Finally, we mention that we may also obtain relevant results by applying Fact 1.7 to known 
lower bounds for classical property testing of probability distributions. Though in general these 
lower bounds are not tight, prior to our work this was (to our knowledge) the only way to produce 
lower bounds for testing spectra with a dependence on e. 

1.3 Our results 

We have four main results. The first concerns the property that Montanaro and de Wolf refer to 
as Mixedness: 

Theorem 1.10. Q{d/e^) copies are necessary and suffieient to test whether p G (^d.xd, 
maximally mixed state; i.e., whether its speetrum is p = {1/d,... ,1/d). 

^One may note that the dependence on e in Theorem 1.8 is slightly worse than that for full tomography; however, 
we speculate that this is an artifact of the analysis and that 0(d^/e^) copies suffice for the EYD algorithm. 
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This is the quantum analogue of Paninski’s Theorem 1.4. We also remark that given the way we 
prove Theorem 1.10, Childs et al.’s Theorem 1.9 can be obtained as a very special case. 

Our second result gives new bounds for testing whether a state has low rank. 

Theorem 1.11. 0(r^/e) copies are necessary and sufficient to test whether p G has rank r 

with one-sided error. With two-sided error, a lower bound ofVL{r/e) holds. 

We note that the copy complexity is independent of the ambient dimension d. Knowing that a 
state is low rank can often make solving a given problem much simpler. For example, quantum 
state tomography can be made more efficient when the state is known to be low-rank [FGLE12]. 
Compare this to Theorem 1.5. 

Next, we extend Childs et al.’s Theorem 1.9 to r vs. r' for any r 1 < r' < 2r. A qualitative 
difference is seen when r' = r + 1; namely, nearly quadratically many copies are necessary. 

Theorem 1.12. Let 1 < A < r. Then 0(r^/A) copies are sufficient to distinguish between the 
cases when p ’s spectrum is uniform on either r or r + A eigenvalues; further, a nearly matching 
lower bound o/fl(r^/A) copies holds. 

As above, we note that these bounds are independent of the ambient dimension d. 

Our final results concern the EYD algorithm from Theorem 1.8. Eirst, we give an arguably 
simpler proof of Theorem 1.8. Next, we complement this with a lower bound showing that the 
analysis of the EYD algorithm from Theorem 1.8 is tight up to logarithmic factors. 

Theorem 1.13. If p € is the maximally mixed state, the algorithm from Theorem 1.8 fails 

to give an e-accurate estimate (with high probability) unless H{d?/e^) copies are used. 

To our knowledge, no such lower bound was known previously. We remark that it is an interesting 
open question whether some other algorithm can estimate an unknown state’s spectrum from a 
subquadratic number of copies. 

1.4 Overview of our techniques 

Eollowing [ARS88, Har05, CM06, CHW07], we use techniques from representation theory of the 
symmetric group ©„• A basic tool is Schur-Weyl duality, which decomposes the space (C'’*)®” as 

= 0PA®Qt (1) 

Ahn 

where the subspace Pa corresponds to the symmetric group, the subspace corresponds to the 
unitary group, and A is a partition of n, thought of as a Young diagram. (Recall that a partition of n 
is a tuple A = (Ai,..., A^) satisfying Ai > ... > A^ > 0 and Ai-p.. .+A£ = n.) In our testing problem, 
the tester is provided with which is invariant under any permutation of the re coordinates, 
and whether the tester accepts or rejects should be invariant under any unitary transformation 
of p. This means that if we measure p®” in the Schur basis described in equation (6) below, we can 
throw away the information from the permutation and unitary registers without losing any relevant 
information. What is left is only the “irrep” label A. 

The end result is this: there is a sampling algorithm—referred to in [CHW07] as weak Schur 
sampling —which, on input a mixed state p®*^, outputs a random partition A whose distribution 
depends only on the spectrum of p. We will denote this distribution by SW^. Eurthermore, an 
argument which is essentially from [CHW07] (though see [MdW13, Lemma 19] for a full statement) 
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shows that for any spectrum property V, there is an optimal tester in the model of Definition 1.2 
whose operation is as follows: 1. Sample A ~ SW”. 2. Accept or reject based only on A. We may 
therefore proceed without loss of generality by analyzing only algorithms of this form. In particular, 
this means we need not study study quantum measurements or algorithms per se; in principle it 
suffices simply to understand the distribution SW” (which is equivalent to the distribution on 
/c-increasing subsequence lengths described in Fact 1.6). 

In case p is the maximally mixed state, the distribution SW” has been fairly well studied, 
starting with the works [TWOl, JohOl, BiaOl, Kup02] (see [MellOa] for a recent, comprehensive 
treatment). It is known as the Schur-Weyl distribution, and we denote it by SW^. (In the limit as 
d —)> oo, it approaches the well-known Plancherel distribution.) The exact distribution on partitions 
given by SW^ is somewhat complicated and difficult to work with, and so various works have instead 
sought to describe large-scale features of a “typical” A ~ SW)). For example, Biane [BiaOl] showed 
that, up to small fluctuations, the “shape” of the random Young diagram A ~ SW^ tends toward 
a certain limiting shape D which depends only on the ratio Furthermore, Meliot [MellOa] 
has characterized these small fluctuations as being distributed according to a certain Gaussian 
process. The second of these results borrows heavily from a proof of the analogous result by Kerov 
(see [1002]) for the Plancherel distribution, and we will give an overview his techniques below. 

Kerov’s approach involves studying a certain space of symmetric polynomial functions on Young 
diagrams. For example, if one is interested in showing that a random A ~ SW^ tends to have some 
coordinates which are much larger than the rest, then it would be natural to study “moments” of 
the form A^. However, the approach of Kerov would suggest studying the following “moments” 
instead: 

OO 

PkW ■= -^ + \f - (-* + 5 )^]’ ^ ^ 

i=l 

The polynomial family (p^) inhabits (in fact, generates) the so-called algebra of polynomial func¬ 
tions on the set of Young diagrams A* (also known as Kerov’s algebra of observables) . There are 
other important polynomial families within A*—in addition to the polynomials, our work in¬ 
volves the pfc, Cfc, pfi, and s* polynomials—and each of these families sheds light on a different 
aspect of the input partition A. For example, though the P/l(A) polynomials don’t give any obvi¬ 
ous information regarding the “shape” of A, they are unique in that we can easily compute the 
expectation [pf((A)] for any mixed state p. There exist some methods for passing from one 

polynomial family to another, and it is often the case that a problem most easily stated in terms 
of one polynomial family is most easily solved in terms of another. 

The main component of our work is lower bounds for quantum spectrum testing, and these 
lower bounds generally have the following outline: 1. Reduce the problem to showing that a cer¬ 
tain expression within the algebra of observables is small with high probability. 2. Use various 
polynomial-estimation techniques developed by Kerov and others for proving concentration of said 
expression. For example, roughly speaking the key component of the lower bound in Theorem 1.12 
is showing that for n <C the expression 

(-l)Vfc(A) 

^2 + 


is typically very close to 0 when A ~ SW". As another example, proving the lower bound in 
Theorem 1.10 reduces to showing that when n <C d/e^, the expression 


E 

partitions /i of n 
with at most d nonzeros 


—2 e,..., -|-2e, —2e) 

nti n“i{<i+(i-i)) 




is typically very close to 1 when A ~ SW^. Our upper bounds generally involve analyzing algorithms 
which accept or reject based on simple statistics of the sampled A ~ SW^. For example, the 
rank tester of Theorem 1.11 accepts if and only if the sampled A has at most r nonzero parts, 
and the uniformity tester of Theorem 1.10 accepts if and only if the “content polynomial” ci(A) 
is sufficiently small. As in the lower bounds, analyzing these algorithms uses techniques from 
the algebra of observables, and we sometimes also require certain combinatorial interpretations 
of the weak Schur sampling algorithm; e.g., its relationship with the Robinson-Schensted-Knuth 
“bumping” algorithm. 
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2 Preliminaries 


2.1 Probabilistic distances 


Given two discrete probability distributions T>i and P 2 on a finite set P, the total variation distance 
between them is 

dTv{'Di,V2) ^ • X/ 1^1 “ ^2(w)| . 




We will also require some nonsymmetric “distances” between probability distributions. The chi- 
squared distance is 


(2?i,2?2) •= E 






- 1 


Further, if supp(Ili) C supp(P 2 ), then the Kullback-Leibler divergenee is 

'Vi{u) 


^KL (1^1,1^2) := E 


In 


T> 2 ((^) 


To relate these quantities, Gauchy-Schwarz implies that dT^y{Vi,V 2 ) < \^Jd^ 2 ('Di,T’ 2 ), and 
Pinsker’s inequality states that ^> 2 )- 

We would also like to introduce a “permutation-invariant” notion of total variation distance. 
Suppose that the set P is naturally ordered; say, P = [d] := {1, 2,..., d}. We define 

d^ 7 (Pi,p 2 ) := d^viviM) = min{dTV( 1 ^ 1 , 1^2 o tt)}. 

7re6d 


Here denotes the probability distribution on [d] given by rearranging 2?j’s probabilities in non¬ 
increasing order, so > ■ ■ ■ > T’|(d). By virtue of the permutation-invariance, we may also 

naturally extend this notation to the case when Di and 1)2 are simply unordered multisets of 
nonnegative numbers summing to 1. 

A d-dimensional mixed quantum state is represented as a density matrix p G ^ 

positive semidefinite matrix with trace 1. We may write p using its spectral decomposition as 

d 

P = 

i=l 
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where the |'I'i)’s are orthornormal and the rji’s are nonnegative reals satisfying + • • • + = 1- 

Equivalently, p describes a probability distribution on pure states in which has probability pi. 
If a is another d-dimensional mixed state with eigenvalues {Ai,..., A^} (thought of as a multiset), 
we will use the notation 


•= '^TV ({dl: • • • • • • , Arf}). 

We will now define trace distance, which is the standard notion of distance between two density 
matrices. (The above nonstandard notion of distance will be related to the trace distance in 
Proposition 2.2 below.) If M G ig any Hermitian matrix with eigenvalues 771 ,..., pd, the trace 
norm of M is 

d 

\\M\\tr := tr(\/MtM) =Y,h\- 

i=l 

Given two density matrices p and a, the trace distance between them is dtr{p,o') := ^\\p — fT||tr- 
The trace distance is the standard generalization of the total variation distance to mixed states; for 
example, it represents the maximum probability with which two mixed states can be distinguished 
by a measurement [NCIO, equation (9.22)]. This property makes it the natural choice of distance 
for property testing of quantum states. We also have the following simple fact: 

Fact 2.1. Suppose p and a are diagonal density matrices with diagonal entries p = {pi,... ,pd) 
and A = (Ai,..., A^), respectively. Then dtr{p, o') = dTy{p, A). 

2.2 Property testing 

In the model of property testing, there is a set of objects O along with a distance measure dist : 
O X O —)• R. A property P is a subset of O, and for an object o G O, we define the distance of 
o to "P to be^ 

dist(o, P) := min{dist(o, 0 ')}. 
o'eV 

If dist(o, P) > e, then we say that o is e-far from P. A testing algorithm P tests P if, given some 
sort of “access” to o G O (e.g., independent samples or queries), T accepts if o G P and rejects if o is 
e-far from P. Generally, the aim is for P to be efficient according some measure, most typically the 
number of accesses made to o. (On the other hand, P is generally allowed unlimited computational 
power. Nevertheless, as we will see, all of the testers considered in this paper can be implemented 
efficiently.) 

We will instantiate property testing in the following natural settings: 

(i) Properties of mixed states: O is the set of d-dimensional mixed states p, the tester gets 
access to (unentangled) copies of p, and dist = dtr- 

(ii) Unitarily invariant properties of mixed states: As above, but P must be unitarily 
invariant; equivalently, whether or not p gV only depends on the multiset of p's eigenvalues. 

(hi) Quantum spectrum testing: O is the set of d-dimensional mixed states, P must be 
unitarily invariant, and dist(/ 9 , cr) = d^™(/ 9 , a). 

^Formally, our sets O will always lie within some or and we always require that V he a closed set. Thus 
the “min” here is well-defined. 
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(iv) Symmetric properties of probability distributions: O is the set of probability dis¬ 
tributions D on [d], the tester gets i.i.d. draws from D, V is any symmetric property, and 
dist = c^TV- 

Let us now establish some basic facts about these models. The simplest fact is that Model (ii) is 
a special case of Model (i). Next, in Model (iv) it would be equivalent if we had chosen dist = ; 

this is by virtue of the assumption that "P is a symmetric (permutation-invariant) property of 
distributions on [d]. Finally, we have the following important simplifying fact, whose proof is not 
trivial: 

Proposition 2.2. Models (ii) and (in) are equivalent. 

Proof. We need to show that if P is a unitarily invariant property of d-dimensional mixed states 
then dtr(/0, P) = d^™(p, P) holds for all mixed states p. By performing a unitary transformation, 
we may assume without loss of generality that p is a diagonal matrix with nonincreasing diagonal 
entries (spectrum). 

The easy direction of the proof is showing that dtr(p, P) < d^™(p, P). To see this, suppose 
cr G P achieves d^™(p, cr) = e. Let a' denote the diagonal density matrix whose diagonal entries 
are the eigenvalues of a arranged in nonincreasing order. Now a' is unitarily equivalent to a, and 
hence cr' G P as well. But dtr(p, c') = e by Fact 2.1 and we therefore conclude dtr(p, P) < e, as 
needed. 

The more interesting direction is showing that d^™(p, P) < dtr(p,P). The authors learned the 
proof of this fact from Ashley Montanaro [Monl4]. Suppose that cr G P achieves dtr{p,(r) = e. 
Since || • ||tr is a unitarily invariant norm, a theorem of Mirsky (see [HJ13, Corollary 7.4.9.3]) states 
that 

||p-0-||tr > Up -O-'lltr, (2) 

where a' (respectively, p') denotes the diagonal density matrix whose entries are the eigenvalues 
of cr (respectively, p) arranged in nonincreasing order. We have p' = p, and a' is again unitarily 
equivalent to cr, implying cr' G P. But the left-hand side of (2) is 2e, and the right-hand side is 
2dTy{p',a') (by Fact 2.1), which in turn equals 2df^y{p,a'). Thus d^™(p,P) < e, as needed. □ 

Finally, we remind the reader of Fact 1.7, which says that any quantum spectrum testing 
problem (in either of the equivalent Models (ii) and (iii)) is at least as hard as the corresponding 
classical problem in Model (iv). 

2.3 Partitions and Young diagrams 

A partition of n > 1, denoted A h n, is a list of nonnegative integers A = (Ai, A 2 ,..., A^) satisfying 
Ai > A 2 > ... > Afc and Ai -|- A 2 -b ... -|- A^ = n. The length of the partition, denoted i{\), is the 
number of nonzero Aj’s in A. The partition’s size is n, and is also written as |A|. Two partitions are 
considered to be equivalent if they only differ in trailing zeros. For example, (4, 2) and (4, 2,0,0) 
are equivalent. We write Par to denote the set of all partitions, of any size. For w G N"*" we will 
use the notation m^(A) to denote the number of parts i with A* = w. Finally, at one point we 
will require the fairly elementary fact (see e.g. [Roml4, (1.15)]) that the number of partitions of n 
is (much more precise asymptotics are known [HR18]). 

One way in which partitions arise is as cycle types of permutations vr G &n- We say that vr has 
cycle type A = (Ai,..., A^) h n if tt is the product of disjoint cycles of size Ai, A 2 ,..., Xk- (Note 
that vr’s length-1 cycles are included.) The standard notation for this is p{tt) = A. However we 
will use this notation extremely sparingly (and with warning) so as to preserve the symbol “p” 
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(b) Russian notation (in dashed lines). The marks on the hori¬ 
zontal axis are integral x-values, and the heavy black line is the 
curve A(a;). 


Figure 1: Two ways of drawing the partition A = ( 6 ,4,4, 3, 3). 


for density matrices. In aid of this, we adopt the following convention: whenever a permutation vr 
appears in a plaee where a partition A is expected, the meaning is that A should be the cycle type 
of TT. We also use the following standard notation: 

^a:= 

lt?>l 


When A h n, the quantity n\/zx is the number of permutations in &n of cycle type A, so 
represents the probability that a uniformly random permutation in has cycle type A. 

It is standard to represent a partition A h n pictorially with a Young diagram; i.e., a certain 
arrangement of n squares, called cells or boxes. There are several conventions for how to draw 

Young diagrams: we will define the French notation, the Russian notation, and the Maya notation. 
3 

In the French notation, the Young diagram for A = (Ai,...,Afc) is drawn with left-justihed 
rows of cells: Ai cells in the bottom row, A 2 cells on top of this, A 3 cells on top of this, etc. As 
an example, the French notation for ( 6 ,4,4, 3, 3) is pictured in Figure la. We think of the French 
diagram as consisting of unit squares sitting in IR^, with bottom-left corner at the origin. 

Given the French diagram, it’s natural to dehne the width of A as Ai, and to refer to i{\) as 
its height. We can also dehne the conjugate partition of A to be the partition \' \- n obtained 
by rehecting the French diagram through the line y = x; i.e., exchanging rows and columns. For 
example, the conjugate of A = ( 6 ,4,4, 3, 3) is A' = (5, 5, 5, 3,1,1). Note that the height of A is the 
width of X', and vice versa; in particular, we sometimes prefer the notation X\ to ^(A). 

We now dehne the Russian notation for A. This is obtained from the French notation by hrst 
rotating the diagram 45° counterclockwise about the origin, and then dilating by a factor of ^/2', 
see Figure lb. The purpose of the dilation is so that the corners of the boxes will have integer x- 
and y-coordinates. The purpose of the rotation is so that conjugation corresponds to rehection in 
the y-axis and so that the boundary of the diagram forms the graph of a function: 


^We will not require the English notation, which is the reflection of the French notation across the horizontal axis. 
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Definition 2.3. Given a partition A drawn in Russian notation, its upper boundary forms the 
graph of a function with domain [—Ai] C ]R. We extend this function to have domain all of IR 
according to the function a: i—?• |x|. We will use the notation A : IR —?■ 1R+ for this function, which we 
remark is a continuous and piecewise linear curve. Any time we write A(x), where A is a partition 
and X G IR, we are referring to this curve. See Figure lb for an example. 

Finally, we define the Maya notation. It contains no boxes; just a sequence of black and white 
pebbles. However the Maya notation is typically drawn in conjunction with the Russian notation, 
with the pebbles being located on the half-integer points Z ^ of the x-axis. In the Maya notation, 
a black pebble is placed at all points directly below a “downward-sloping” segment in A’s graph, 
and a white pebble is placed at all points directly below an “upward-sloping” segment. (Thus all 
sufficiently negative half-integer points have a black pebble and all sufficiently positive half-integer 
points have a white pebble.) The notation also includes a vertical tick mark to denote the location 
of the origin. A picture of the Russian and Maya notation for A = ( 6 ,4,4, 3, 3) appears later in 
Figure 4 (the reader consulting it now should ignore the red and green coloring, the dashed lines, 
and the box labeled “d”). One can check that the sequence of pebbles uniquely identifies the 
partition A. It also uniquely determines the position of the origin mark, in that the number of 
black pebbles to the right of the origin mark always equals the number of white pebbles to the 
left of the origin mark. These numbers are both equal to d(A), defined to be the number of cells 
touching the y-axis in the Russian diagram. We make one more definition: 

Definition 2.4. Given the Maya diagram of a partition A, we may define its modified Frobenius 
coordinates to be the half-integer values > • • • > 0 and > ■ ■ ■ > > 0 (for 

d = d(A)), where a* is the position of the ith rightmost black pebble and b* is the negative of the 
position of the ith leftmost white pebble. One may check that, equivalently, a* = Xi — i ^ and 
b* = X'^ — i For example, if A = ( 6 ,4,4, 3, 3), then a* = (^, |, |) and b* = (|, 2 , |). The 
coordinates have the property that ) = I'^l- 

For a partition A (drawn either in the French or Russian notation), we often use the symbol 
to denote a box in A’s Young diagram. We write [A] for the set of all boxes in the diagram. 
Each box □ G [A] is indexed by an ordered pair (i, j), where i is D’s row and j is D’s column. Note 
that this indexing is slightly peculiar vis-a-vis the French notation, in which the center of □ has 
Cartesian coordinates {j — ^,i — ^). We define the content of cell □ to be c(n) := j — i. Note that 
in the Russian diagram, the content of □ is the x-coordinate of its center. We also define the hook 
length h(n) of □ via the French notation: it is the number of cells directly to the right or above □, 
including □ itself; equivalently, it is (A* — j) + (A'- — z) -|- 1. 

Having defined “content” for cells in a Young diagram, we may introduce some convenient 
notation (essentially from [0098b]) that generalizes the standard notions of “falling factorial power” 
and “rising factorial power”. First, for 2 : G IR and m G N, recall the falling factorial power^ 

:= z{z-l){z-2)---{z-m + 1) 

and rising factorial power 

:= z{z + l){z + 2)---{z + m-l). 

We generalize this notation to the case of an arbitrary partition A h m: 

2 ^^ := [] ( 2 -c(n)) and := (2 + c(n)). 

□e[A] DeiA] 

"^Or Pochhammer symbol, sometimes denoted {z)m or z—. 


13 



6 

7 

00 



5 

6 

6 



CO 

CO 

4 

5 


2 

2 

CO 

CO 


1 

1 

1 

2 

CO 

CO 


Figure 2: A semistandard tableau of shape A = (6,4, 4, 3, 3) with alphabet [8]. 


2.3.1 Random words and Young diagrams, and symmetric polynomials 

Definition 2.5. Let A be an alphabet; i.e., a totally ordered set. Most often we consider A= [d]. 
A word is a finite sequence (oi,..., an) of elements from A. It is weakly increasing if ai < 02 < 

■ ■ ■ < On and strongly (or strictly) increasing if ai < 02 < • • • < an- If 2? is a probability distribution 
on A we write to denote the probability distribution on words of length n given by drawing 
the letters independently from D. 

Definition 2.6. Given a word a G [d]”, there is an associated partition A h n of length at most d 
called the sorted type (or histogram). It is defined as follows: A* is the frequency of the rth-most 
frequent letter in a, for 1 < i < d. In other words, A is the histogram of letter frequencies, sorted 
into nonincreasing order. For example, the sorted type of (4,1, 3,4,4,4,1,4) G [4]® is (5, 2,1, 0) h 8 . 

Definition 2.7. Let xi,..., Xdhe indeterminates, typically standing for real numbers. For m G N, 
the mth power sum symmetric polynomial is Pm{x) = Yli=i ^T- More generally, for a partition A 
we define p\{x) = Oilu I’Ai(a:). By our conventions, if vr G 6n then Ptt(x) denotes p\{x), where 
A is the cycle type of vr. If P = (r/i,..., r]d) is a probability distribution on [d], there is a natural 
interpretation of PniVd, ■ ■ ■ jild)- it is the probability that a random word a ~ p®” jg invariant 
under the permutation vr. 

Definition 2.8. Let A h re, and think of its Young diagram in the French notation. If each cell is 
filled with an element from some alphabet A, we call the result a Young tableau of shape A. The 
Young tableau is said to be semistandard if its entries are weakly increasing from left-to-right along 
rows and are strongly increasing from bottom-to-top along columns. Figure 2 gives an example 
semistandard tableau of shape ( 6 ,4,4,3,3). If the rows are in fact strongly increasing, the Young 
tableau is called standard. 

Definition 2.9. For reasons we will see later, the number of standard Young tableaus^ of shape 
A h re over alphabet [re] is denoted dim(A). It can be computed via the Hook-Length Formula of 
Frame, Robinson, and Thrall [FRT54] (see also [Sta99, Corollary 7.21.6]): 

7l! 

We will also consider counting semistandard tableaus, via the following definition: 

®Often spelled “tableaux”. 
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Definition 2.10. Let xi,...,Xd be indeterminates, typically standing for real numbers. Given 
A h n, the Schur polynomial sx{xi,... ,Xd) is the degree-n homogeneous polynomial defined by 
where the sum is over all semistandard tableaus of shape A over alphabet [d], and where 


T . 

X := 


n 

2=1 


of occurrences of letter i in T) 


The following formula from [Sta99, Corollary 7.21.4] thereby lets us count the number of such 
tableaus: 

'---^ UnelA] 

d entries 

We record here a consequence of the above two formulas: 

Proposition 2.11. Let X be a partition and let d G Z'*'. Then sa(1) • • • j 1) = 


(dim A)d1'''' 

|A|! ■ 


d entries 

When t'(A) > d, there are no semistandard tableaus of shape A over alphabet [d]. Thus, the 
sum is the empty sum. This gives us the following fact about Schur polynomials: 

Proposition 2.12. Consider the Schur polynomial sx{xi ,..., x^). If i{X) > d then sx = 0. 


Though it is not at all obvious from the definition, the Schur polynomials are symmetric. This 
can be inferred from the following classical fact (see e.g. [Sta99, Theorem 7.15.1]), which expresses 
them as the ratio of a skew-symmetric polynomial and the Vandermonde determinant: 

detfx)*+^^"^y 

Theorem 2.13. sa(xi, ..., x^) =- -p -r— 

det(xf-n 

\ J ij 

We will actually not need this formula. Instead, we will next describe a combinatorial algorithm 
which gives an interpretation for sx{rii,, Pd) when V = (rji,..., rjd) is a probability distribution. 


2.3.2 The RSK algorithm 

We now describe the Robinson-Schensted-Knuth (RSK) algorithm RSK(-), which takes as input a 
word a G Al"' and outputs a partition A = RSK(a) h n. The relevance of RSK to quantum spectrum 
testing is described at the end of this section. As there are many descriptions of the RSK algorithm 
in the literature (see, e.g., [Knu70, Bay02, Dor05, Roml4]), we will be brief. 

The RSK algorithm. Given as input a word a = (ai,...,a„) over (ordered) alphabet Al, the 
RSK algorithm produces a sequence Tq, ... ,Tn of semistandard tableaus over Al, with Tj having 
size i (and being thought of in French notation). Tableau Tj+i is produced from tableau Tj via 
the “insertion” of letter a* into the 1st row. The insertion algorithm for letter b into row j of 
tableau T is as follows: Find the rightmost position in the jth row such that if b were placed there, 
weak-increasingness along row j would be maintained. If this position is at the end of the row, the 
insertion of b is complete. If instead it is at a cell that already contains some letter c (which will in 
fact be the least c in row j with c > b) then c is “bumped up”. By this we mean that the insertion 
algorithm is recursively applied to letter c and row j + 1 of T (which may be a newly created row, 
in which the insertion will immediately terminate with c in its own row at the top of T). In the 
end, the output of the RSK algorithm is the Young diagram A F n given by the shape of i.e., 
RSK(a) is Tn with its cell entries erased. 
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To get some feel for this algorithm, note that if the inserted word a is weakly increasing then 
RSK(a) = (n) h n. On the other hand, if a is strongly decreasing, the output will be RSK(a) = 
(1,1,...,!) h n. More generally, it is not hard to show that when RSK(a) = A, the value Ai 
is the length of the longest weakly increasing subsequence of a, and ^(A) = A'^ is the length of 
the longest strongly decreasing subsequence of a. Even more generally, we have the following 
theorem of Greene [Gre74], completely characterizing the partition RSK(a) in terms of increasing 
subsequences: 

Theorem 2.14. Let RSK(a) = A. Then for each k > 1, the value Ai + ... + Afc is the length of the 
longest k-increasing subsequence in a (as defined in Fact 1.6). 

Indeed, the RSK algorithm is most commonly used in the literature to study the length of the longest 
increasing subsequence of a random permutation (equivalently, of a random word a ~ T®"", where 
X denotes the uniform distribution on the alphabet A = [ 0 , 1 ]). 

Let us note one immediate consequence of Greene’s theorem. (This consequence may also be 
derived directly from the description of the RSK algorithm.) 

Proposition 2.15. Given a G [d]”, let RSK(a) = A. Write Ci(a) for the number of letter i’s in a. 
Then A majorizes c{a) := (ci(a),..., Cd{a)). 

To see why this is true, note that for each /c G [d], the all one’s, all two’s, ..., and all k's subsequences 
together form a /c-increasing subsequence of size ci(a) + ... + Ck{a), which by Theorem 2.14 is at 
most Ai + ... + Afc, giving the proposition. As c(a) is the histogram of o, this shows that we can 
view RSK(a) as a “shifted histogram” of a in which cells are shifted towards the lower numbers. 

Although Greene’s Theorem succinctly characterizes the output by the RSK algorithm, it is 
important to retain the algorithm itself and even to consider an extension of it. Suppose that when 
the RSK algorithm is applied to a we also form a standard tableau T' over alphabet [n], where T' 
has the same shape as T„ and each cell □ in T' is labeled by the “time” at which □ was created 
in Tn. As noted by Knuth [Knu70], the word a is uniquely determined by the pair {Tn,T'). As a 
consequence of this and of previous formulas, it is not hard to verify the following important fact, 
perhaps first observed by Its, Tracy, and Widom [ITWOl, equation (2-1)]: 

Proposition 2.16. Let a ~ T?®”, where T) = (ryi,... , 77 ^) is a probability distribution on [d]. Then 
for each A h n, 

Pr[RSK(a) = A] = dim(A) • sx{vi, ■■■,%)■ 

By the symmetry of the Schur polynomials, this implies the surprising fact that the distribution 
of RSK(a) is invariant to permutations of T. 

Finally, we mention the connection between the RSK algorithm and quantum spectrum testing. 
As we will eventually see in Section 2.6 (Remark 2.24), all of quantum spectrum testing can be 
boiled down to classical testing of symmetric probability distributions P, with the following twist: 
Rather than getting to see a random word a sampled from T>®”, the tester only gets to see the 
partition A = RSK(a). In light of Greene’s Theorem 2.14, this statement is equivalent to Fact 1.6. 

2.4 Representation theory, and the symmetric group 

Herein we recall some basics of representation theory. We will mainly focus on C-representations of 
finite groups G (though at one point we will want to consider representations of the unitary group). 
We may therefore define a representation 77 of G to be a group homomorphism from G into Ud, for 
some d G Z'*'. Here Ud denotes the group of d x d unitary matrices. The number d is also called 
the dimension of the representation p, and is denoted dim(/i). 
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Two representations and /i 2 are said to be isomorphic if there is some unitary matrix U such 
that U= IJ, 2 - In this case we write /xi = /U 2 . The direct sum of k representations /xi,..., 
produces the representation p, given by block-diagonal matrices; 


Ka) ■= 


hiia) 0 
0 Ma) 


0 

0 


0 0 ... fik{g) 


for all g £ G. Equivalently, we may write 


k 

i^ia) gi{g)- 

i=l 


( 3 ) 


( 4 ) 


We will also write = ;Ui © ... © /Xfc to denote that /x is the direct sum of ..., ;Ufc- 

Let gi be a representation of the group Gi and g 2 be a representation of the group G 2 . Then 
the tensor product of gi and g 2 , denoted /xi © ^ 2 ; is the representation defined by 


(/xi © ^ 2 ) ( 5 , h) := {gi{g)) © {g 2 {h)), 

where the right-hand side uses the ordinary matrix tensor product. We have dim(/xi © /X 2 ) = 
dim(;Ui) • dim(|U 2 )- 

In our setting, a representation of G is said to be reducible if there are representations /xi and g 2 
such that g = gi ® g 2 - Otherwise it is irreducible, and is often called an irrep for brevity. Every 
representation can be uniquely decomposed into a direct sum of irreps (up to isomorphism and 
rearrangement of summands). Further, the set of all irreps of G (up to isomorphism), denoted G, is 
finite. Indeed, if we define the regular representation of G to be the |G|-dimensional representation R 
given by R{g) = Ylh&G la^) i^l), then i?’s decomposition into irreps contains every g £ G, with g 
occurring dim(^) times. As a consequence, we have the formula 

|G| = j;(dim^)2. 

/iSG 

This fact leads to a natural probability distribution on irreps of G: 

Definition 2.17. For a finite group G, the Plancherel distribution is the probability distribution 
on irreps in which g £ G has probability (dim|u)^/|G|. 

For a group G and a representation g, the character Xu i® function : G —)■ C defined by 

Xuia) = tr(/x(5()), 

for each g £ G. We have the following simple fact; 

Fact 2.18. Let g be a representation of G. Then Xii is a class function; i.e., it is constant on the 
conjugacy classes of G. 

We now recall some basics of Fourier analysis over an arbitrary finite group G (though we will 
ultimately only need the case G = &n)- For f,g : G ^ C we define {f,g) = Eu'^G[f{u)g{u)]. 
Under this inner product, the characters {Xu)^£g orthonormal basis for the space of class 

functions / ; G —)■ C. For general /,5 ; G —?■ C we define (/ * g){u) = G[f{v)g{v ^xi)]; this 
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includes a nonstandard normalization by |^. For a class function / and /x G G we employ the 
following “Fourier notation”: /(/u) = (/,X/^)- (According to standard notation we would have 
7(/^) = i^tr^/^). Then Fourier inversion is simply / = fMXfi- Further, if g is another class 

function we have the formula / * g{fi) = ^, /(/i)ff(/i). 

We close this section by specifically discussing the representation theory of the symmetric 
group &n- Two permutations vr, ct G &n are conjugate within the group 6n if and only if they have 
the same cycle type. As a result, the conjugacy classes of 6n can be identified with the partitions 
of n. As it happens, the set of irreps of the symmetric group can also be naturally identified 
with the partitions of n. For A h n, we will use the notation for the corresponding irrep of ©„• 
(To avoid getting too far afield, we will not actually describe the representation p;^.) Recalling 
Fact 2.18, we introduce the following notation: 

Definition 2.19. Let A h n. We denote the character Xp^ aiore simply as xa- We remark that 
Xa is known to take on only rational values; in particular, xA = Xa- If /U L n then we let Xa(t) 
denote Xa(^)) where vr G ©^ is any permutation with cycle type /x. This is well defined since xa 
is constant on the conjugacy classes of Gn- Finally, we also write dim(A) for dim(pA). It is well 
known [SagOl, Theorem 2.6.5] that dim(A) is equal to the number of standard Young tableaus of 
shape A over alphabet [n], explaining the notation from Definition 2.9. 

Following Stanley [Sta99, Corollary 7.17.5], we can actually give a definition of the symmetric 
group characters x^ in terms of the power sum and Schur polynomials: 

Theorem 2.20. In the context of Fourier analysis over the group G = ©„, suppose fj, \- n and 
X G C'^. Then P(.)(x) := tt i—)• p-k{x) is a class function, and its Fourier coefficients are given by 

Although this can be taken as an implicit definition of the characters ^iii more often think 

of the characters Xu ns “known” and of Theorem 2.20 as letting us express the Schur polynomials 
in terms of the power sum polynomials. 

2.5 Weak Schur sampling 

In this section we will introduce the weak Schur sampling algorithm. Our treatment of this topic 
will heavily follow the treatments given in Aram Harrow’s thesis [Har05] and the paper [CHW07]. 

To motivate the algorithm let us briefly consider the classical problem of testing symmetric 
properties of probability distributions on [d]. In this model, the tester obtains a random word 
a = (ai,... ,a„), where each letter ai is drawn independently from an unknown distribution T) 
on [d]. The tester wants to decide whether T) satishes a certain symmetric property V. Since 
the samples ai,...,a„ are independent, the tester could—without loss of generality—randomly 
permute them according to any vr G ©n- Similarly, since the property V is symmetric, the tester 
could—again, without loss of generality—simultaneously apply any permutation a G &d to the 
letters it sees. Roughly speaking, the tester can “factor out” the action of the group &n x ©d- The 
information that remains is precisely the sorted type A h n of a (recall Definition 2.6) f Thus we 
see that the task of analyzing property testing of symmetric probability distributions boils down to 
the task of understanding the random partition A h n (of length at most d) induced as the sorted 
type of a random word drawn from 

®This partition carries the same information as the so-called “fingerprint” used in classical property litera¬ 
ture [BatOl, Val08]. 
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A similar but more complicated state of affairs holds for quantum spectrum testing. In this 
case, there is an unknown d-dimensional mixed state p, and the tester may measure n copies, p®”", 
in an attempt to determine whether p satisfies a certain unitarily-invariant property V. As before, 
the tester could (without loss of generality) randomly permute the copies according to any vr G Sn- 
And in this quantum scenario, by the unitary-invariance of V, the tester could also (without loss 
of generality) simultaneously apply any unitary U £ Ud to each copy. Weak Schur sampling refers 
to the process of “factoring out” this action of &n x Ud- What remains is again a random partition 
A h n of length at most d, whose distribution depends only on the spectrum of p. (In fact, as 
we will see later in Remark 2.24, the distribution of A is precisely that of RSK(a) where a is a 
random word chosen according to the probability distribution on [d] defined by p’s spectrum.) To 
understand this situation more thoroughly, we will need to discuss representation theory in more 
detail.’^ 

As mentioned above, the groups &n and Ud each have a natural, unitary action on the space (C'^)®"' 
the associated representations P and Q (respectively) are defined on the standard basis vectors 
|ai) (8) \a 2 ) (8) • • • (8) \an) (for a* G [d]) via 

P(7r) |ai) (g) \a 2 ) 0 ... (8) \an) = Iott-Ri)) ^ |a 7 r-i( 2 )) <8) • • • <8) |a^-i(n)), 

Q(C/) |ai) (g) \a 2 ) (g)... (g) \an) = {U\ai)) (g) {U\a 2 )) (g) ... 0 {U\an))- 

We know the irreps of &n are indexed by partitions of n; thus, the representation P must decompose 
as 

- ©PaW ® (5) 

X\-n 

with m\ denoting the number of copies of in the decomposition. The representation Q also 
decomposes into irreps of the group Ud- As it happens, these (infinitely many) irreps can also be 
naturally identified with partitions; specifically, for each partition A G Par with length at most d, 
there is an associated irrep G Ud- Furthermore, the theory of Schur-Weyl duality states that 
there is significant joint structure to these two decompositions. This structure ultimately arises 
because the two representations P and Q commute (i.e., P(7r)Q(C/) = Q([/)P(7r) for all vr G S^, 

U G Ud), and hence the simultaneous action PQ defined by PQ(7r, U) := P(7r)Q([/) is a representation 
of the direct product group 6k x Ud- 

Schur-Weyl duality. PQ = ^PA®qv 

Ahn 

In particular, by taking U = id we see that m\, the multiplicity of pA in the decomposition of P, is 
equal to dim(q^). Similarly, by taking vr = id, we see that the multiplicity of q^ in the decomposition 
of Q is dim(A) = dim(pA). 

To restate Schur-Weyl duality, there exists a certain d” x d” unitary matrix t/schur such that 

C/schurP(vr)Q([/)t/Lur = E ® ® (6) 

Ahn 

for all vr G 6n, U G Ud- We view f/schur as a unitary linear transformation that performs a change 
of basis, from the standard basis into the Schur basis. We may now state the weak Schur sampling 
algorithm: 

^In particular, we will go slightly beyond the framework from Section 2.4 by mentioning representations of the 
unitary group, which is of course not a hnite group. 
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Weak Schur sampling. Given p®"', where /? is a d-dimensional mixed state, the weak Schur 
sampling algorithm works as follows: 

1. Measure p®"' in the Schur basis, receiving basis state | A) (g) |p) (g) |q). 

2. Output A, a partition of size n and length at most d. 

We will write SW” for the distribution on partitions induced from p®"' by the weak Schur 
sampling algorithm. We will also use the shorthand 


SW^(A) := 


Pr [A = A]. 

A~SW" 


As we will state shortly, performing weak Schur sampling is without loss of generality in the 
context of testing unitarily invariant properties. To see why, suppose /? is a d-dimensional mixed 
state, and consider the product mixed state p®”. Then it’s not too hard to show (using invariance 
under P and Schur’s Lemma, see e.g. [Har05, equation (6.1)]) that when represented in the Schur 
basis, it has a “trivial &n register”: 

Fact 2.21. We may write l^)(-^l (g / (g) for some matrices Here, 

for each A we interpret I as the dim(A) x dim(A) identity matrix. 

As a consequence, it makes sense that a testing algorithm may discard the &n register. Now 
in general, the register” of is not trivial, and thus it may seem like the tester is 
losing information by discarding it. (Indeed, this potential loss is the source of the word “weak” 
in the phrase “weak Schur sampling”.) However when testing unitarily invariant properties of p, 
the state p®"' should be treated no differently than the state Q{U)p®^Q{W) = (t/pf/'l')®"', for any 
U G Ud- In particular, a tester could average over all unitaries U, and this would cause the 
resulting state to have trivial a Ud register in the Schur basis. This idea is formalized in the 
next lemma, which shows that weak Schur sampling is an optimal quantum measurement for the 
testing of unitarily invariant properties. The lemma, implicit in [CHW07], can be found with proof 
in [MdW13, Lemma 19]. 

Lemma 2.22. LetV be a unitarily invariant property of d-dimensional mixed states. Assume there 
exists a tester which uses n copies of the input state p, accepts all states p G V with probability 
at least 1 — 5, but accepts all states which are e-far from V with probability at most 1 — /(e) for 
e > 0. Then there exists a tester with the same parameters which consists of performing weak Schur 
sampling on p®" and then classically postprocessing the results. 


As a result of this lemma, we are able to focus exclusively on the weak Schur sampling algorithm in 
this paper. One final remark: Although our quantum spectrum testing upper bounds are formally 
only concerned with copy complexity, they can in fact also be implemented efficiently, by (quantum) 
algorithms running in time poly(n, log d, log(l/e)). This holds because the only expensive operation 
is the computation of the Schur change-of-basis, and this can be done in poly(n, logd, log(l/e)) time; 
see [BCH05, Appendix A], [Har05, Section 8.1.1]. 


2.6 Understanding the weak Schur sampling distribution 

There are a several ways to understand the probability distribution induced by weak Schur sampling 
algorithm, each of which proves advantageous in different settings. Let us begin with a direct 
calculation that expresses the probabilities in terms of the Schur polynomials. The following known 
fact may be attributed to Alicki et al. [ARS88]; see [Aud06, equation (36)] for further discussion. 
We will include a proof for the reader’s convenience. 
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Proposition 2.23. Let p he a d-dimensional density matrix with eigenvalues r]i,ri 2 , ■ ■ ■ ,Vd- Then 

SWp(A) = dim(A) • sx{vi,V 2 , •••,%)• 


In particular, SW” depends only on the spectrum of p. 

Remark 2.24. As this is the exact same formula as in Proposition 2.16, we conclude that if T) is 
the probability distribution on [d] given by the spectrum of p (in any order), then 

SW”(A) = Pr [RSK(a) = A]. 

This gives a completely “quantum-free” way of analyzing quantum spectrum testing, as mentioned 
in Fact 1.6. Nevertheless, we will actually use this fact only occasionally (mainly via Theorem 2.14). 
As we will see later, interpreting SW” via representation theory proves to be more powerful. 

Proof of Proposition 2.23. By definition, SW”(A) = tr(n;,p®"'), where 11;, denotes the operator that 
projects onto the subspace corresponding to A in the Schur basis. It is a basic fact of representation 
theory (following from orthogonality relations, see e.g. [CHW07, Equation (7)]) that from the 
decomposition (5) of P we may deduce 


H;, = dim(A) • E 

7r~6„ 


Xpa(^) •P(^) 


dim(A) • E [yA(7r) • P( 7 r)] . 

7r~6n 


Thus 

SW;((A) = dim(A) • E [xA(7r) • tr(P(7r)p®")] . 

7r~6n 

To compute the trace, we may assume by unitary invariance that p = diag(r 7 i,..., rjd)- Thus 

E (]!’)«. 

words \ 2=1 

(ai,...,a„)e[(i]" 

Notice that if we let Dr) denote the probability distribution on [d] in which T>r){o) = rja, then the 
coefficient 0^=1 da; above is T>®"'(ai,..., On); i.e., the probability that a random length-n word 
drawn i.i.d. from is equal to (oi,..., an). From the definition of P(7r) we further deduce 

P(^)p®” = • ,an) |a,r-i{i), • • ■ ,a7r-l(n)) («!,• ■ • ,an| • 

(ai,...,a„) 

We immediately conclude that tr(P(7r)/9®”') is equal to the sum over all 7r-invariant words (ai,..., an) 
of P®"'(ai,..., On). Recalling Definition 2.7, this is precisely given by the power sum polynomial 
Pnipi,. ■ ■ ,r]d)- Therefore 



SW”(A) = dim(A) • E [x\{tt) ■ p^ivi, ■ ■ ■ ,r]d)] , 

and the proposition now follows from Theorem 2.20. □ 

For the purposes of the testing lower bounds in this paper, the case of greatest interest to us is 
when p = \ldxd is the maximally mixed d-dimensional state] i.e., the spectrum of p is the uniform 
distribution Unif(i = (^,..., g)- This is also by far the most well-studied case in the literature: 
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Definition 2.25. The Schur~Weyl distribution with parameters n and d, which we denote SW^, 
is the distribution on partitions A h n of length at most d given by SW” in the case that p is 
the maximally mixed state of dimension d. Equivalently, it is the distribution of RSK(a), where 
a ~ [d]” is uniformly random. 

Combining Proposition 2.23 and Proposition 2.11, together with the homogeneity of the Schur 
polynomials, we obtain the following known formula (cf. [CHW07, equation (26)]): 

Proposition 2.26. SW))(A) = 

^ ' nl d^ 

j't'A 

Notice that if n is held fixed and d —>■ oo, the fraction ^ tends to 1 and we obtain the Plancherel 
distribution (for ©„) on partitions described in Definition 2.17. This recovers the well-known fact 
that the Plancherel distribution is obtained by running the RSK algorithm on a uniformly random 
permutation (equivalently, a uniformly random word from [0,1]"'). We will write Planch^ for this 
distribution. 

Remark 2.27. R is easy to see that SW^(A) = ^ ■ dim(pA) • dim(q^). From Remark 2.24, we see 
that there are dim(pA) • dim(q)[) words a G [d]"’ such that RSK(a) = A. 

2.7 Asymptotic theory of the symmetric group 

For small re, the exact distribution on partitions of re given by the Plancherel or Schur-Weyl 
distributions is not particularly easy to understand. As a result, a significant body of work has 
been devoted to showing asymptotic properties of these distributions as re grows large. 

Let us focus first on the Plancherel measure. Perhaps the most basic thing one could ask for is 
the “typical” width and height of a diagram drawn from this distribution. Though either of these 
values could be as large as re, Hammersly [Ham72] showed that both values tend to concentrate 
around the same number c • ^/n, for some constant c (later determined to be c = 2 [LS77, VK77]). 
Therefore, in order to put partitions of different values of re on equal footing, we can define scaled 
partitions as follows: 

Definition 2.28. Let A h re and recall Definition 2.3. Then A : IR —?■ 1R+ is defined as A(x) := 
X{^/n ■ x) / ^/n, for all x. 

Logan and Shepp [LS77] and Vershik and Kerov [VK77] independently proved the so-called “law 
of large numbers” for the Plancherel distribution, showing that when A ~ Planch„ and re —>■ oo, 
the function A converges to D(x), the curve defined as 

I f(xarcsinf+ \/4-x2), jxj < 2, 

1 12:1 1^1 — ^ * 

This “ice cream cone”-shaped function is pictured in Figure 3 (c = 0 case). Though this curve is a 
limiting shape rather than the Russian notation of any Young diagram, it is useful to think of it 
as a continual analogue of a Young diagram, as per the following definition. 

Definition 2.29. A continual diagram is a function / : IR —)■ IR satisfying (1) / is 1-Lipschitz and 
(ii) f{x) = jxj when jxj is sufficiently large. 

This definition originates in the paper of [Ker93a]. 

More recently, Kerov [Ker93b] showed a “central limit theorem” for the Plancherel measure, 
characterizing the deviation of a random Young diagram from the curve D(x) by a certain Gaussian 
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Figure 3: The Biane limiting curves Qc- The c = 0 case corresponds to the function Q(x). 


process. A second proof of this result, also by Kerov, was given in the paper of Ivanov and 
Olshanski [1002]. Much of our work is based on the techniques of this paper. 

Subsequent studies revealed that a similar state of affairs exists for the Schur-Weyl SW[J dis¬ 
tribution, though in this case the features of a “typical” A ~ SW^ depend on the ratio c := ^. 
Biane [BiaOl] extended the Plancherel law of large numbers to the Schur-Weyl distribution in the 
case when c is a fixed constant and n, d —>■ oo. In this case, for a random A ~ SW[J, the function 
A will approach a certain limiting curve He, specified as follows: 


Theorem 2.30 ([BiaOl]). Fix an absolute constant c > 0 and assume n, d —)• oo with ^ 
Then 


Pr [||A — HcIIoo ^ e] 0, 

A~SW" 


where Vtc is the continual diagram defined as follows: 


c. 




^ce(o,i)(®) 


X — c| < 2, 




otherwise] 

i{x) = l ^ ^ ((® “ 1) arcsin(3^) + - (x - 1)2^ i/|x-l|<2, 

otherwise] 

ifx£ (^,c- 2) 

(^)+ 

otherwise. 


^€>1(2:) — < 


|x| 

X + 

2 


- I xarcsml 
|x| 


These curves are pictured for various values of c in Figure 3 (which we have reproduced from [MellOa]). 
Meliot [MellOa, MellOb] has extended Kerov’s central limit theorem to the Schur-Weyl distribution, 
characterizing the fluctuations of A around the limiting curves given by Biane. 
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One consequence is these results is that when n = o{d?), the function A converges to the ice 
cream cone curve n(x) from above. This fact is a manifestation of the discussion at the end of 
Section 2.6 concerning SW^ tending to Planch^ as d —>• oo. Indeed, Childs et al. [CHW07] showed 
that when n = o(d), the two distributions are statistically indistinguishable (from which the lower 
bound in Theorem 1.9 follows via the triangle inequality dTv(SW”, SW^^) < dTv(SW(l, Planch^) + 
dTv(Planch„,SW^^)). 

We close this section by recording some simple concentration bounds on the width and length 
of A ~ SW^. They are not as precise as what is suggested by the above limit theorems, but they 
have the advantage of giving concrete error bounds. We follow a simple line of argument similar to 
that in [Roml4, Lemma 1.5]. 

Proposition 2.31. Let A ~ SW^. For every B G we have Pr[Ai > i?] < 
same bound holds for Pr[A']^ > B]. 

Proof. By Theorem 2.14, Pr[Ai > B] (respectively, Pr[A^^ > B]) is equal to the probability that a 
uniformly random word from [d]” contains a weakly increasing (respectively, strongly increasing) 
subsequence of length exactly B. As weakly increasing subsequences are more probable than 
strongly increasing ones, it suffices to bound 


Pr[Ai >B]< 


/(I + B/d)e‘^n'\^ 

V ) 


Letting S denote the number of weakly increasing subsequences of length B in a random word we 
have 

Pr[A,>Bl<E[Sl=(").^. 

where c is the number of words in [d]^ which are weakly increasing. Evidently c also equals 
the number of “weak d-compositions of B”, which [Stall, Chapter 1.2] is We 

conclude 


Pr[Ai >B]< 


(T). 

d^ - d^ 


/(I + B/d)e‘^n\^ 

V ) 


as needed. □ 

2.8 Polynomial algebras 

We have already discussed the power sum and Schur polynomials, which are elements of the C- 
algebra A of symmetric polynomials in indeterminates xi,X 2 ,...Important to our work will be 
a closely related polynomial algebra A*, the algebra of shifted symmetric polynomials, formally 
introduced introduced in [0098b]. This algebra consists of those polynomials which are symmetric 
in the “shifted” indeterminates Xi := Xi — i + c, where c is any fixed constant. (The definition 
does not depend on the constant c.) When we view the inputs to the shifted symmetric functions 
xi,X 2 ,... as the values Ai,A 2 ,... of a partition A, the result is (isomorphic to) Kerov’s algebra 
of polynomial functions on the set of Young diagrams, also known as the algebra of observables 

® Strictly speaking, these are families of bounded-degree polynomials, one for each number of indeterminates, 
which are stable in the sense that p\{xi,... ,Xd,0) ~ p\{xi,... ,Xd), and similarly for sx. See, e.g., [Mac95] for a 
formal definition via projective limits. 
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of diagrams. In a nutshell, the importance of this algebra is that, on one hand, it still contains 
polynomials that are similar to “power sums” or “moments” of the Aj’s; and, on the other hand, it 
is easier to compute their expected value under SW” distributions. 

We will need to study several families of observables/shifted symmetric polynomials, and their 
relationships: 

Definition 2.32. The following polynomials are known to be elements of A*. (We describe the 
first four as observables of Young diagrams.) 

• For k > 1, 


d{\) 

PlW :=E(« 

i=l 


i=l 



These are the most basic polynomials on Young diagrams, giving the “moments” of the coordi¬ 
nates. For more information on them see [1002], where they are introduced (in equation (1.4)) 
under the notation pfc(A). We use the notation Pfc(A) to distinguish them from the ordinary 
power sum symmetric polynomials. It is obvious from the second definition above that the 
Pk polynomials are in A*. In fact they are algebraically independent, and they generate A*. 

• For A:>0, the A:th content sum polynomial is Cfc(A) := c(n)^. Although these polyno¬ 

mials are quite natural, we will have little occasion to use them. The fact that they are in A* 
was proven in [K094]. 


• For k > 2, 

/ OO 

x^~‘^a{x) dx, 

■OO 

where a{x) := ^(A(x) —jxj). These polynomials were introduced and shown to be algebraically 
independent generators of A* in [1002, Section 2]. They can shown to be the “moments of 
the local extrema of A(x)”, and are also useful for studying continual diagrams. We use them 
only briefly, to pass between the p^. polynomials and pj. polynomials defined below. 

• For A F n and p\- k, the central characters are defined by 



(„lk xa(mui"-0 

J ■ dim(A) 

lo 


if n > A:, 
if n < A:. 


where p U denotes the partition {p, 1,1,..., 1) h n. In case p = {k) we simply write 
p^(A). Note that we are somewhat unexpectedly applying the character xa to (an extension 
of) p, and not the other way around. The advantage of the p^^ polynomials is that, by virtue of 
them being characters of the symmetric group (up to some normalizations), their expectations 
under SW” can be easily calculated exactly, as we will see below. A disadvantage is that, 
by virtue of them being characters of the symmetric group, explicit formulas for them are 
famously quite complex [Las08, FerlO] (though in Section 2.8.1 we will mention a formula 
that allows one to compute p\. for small k fairly easily). Wassermann [Was81, III.6] showed 
that the p\. polynomials are in A*, and in fact [VK81, K094, 0098b] more generally the 
polynomials pfj form a linear basis of A*. 


25 



• For /X h fc, the shifted Schur polynomial in indeterminates xi,..., is 

det^(xj — i + 

s* (xi, ...,Xd) = - y ^• 

det^(xj — i + j 

These polynomials are the shifted analogues of the Schur polynomials (cf. Theorem 2.13). 
They were introduced by Okounkov and Olshanski [0098b], and are similar to the earlier- 
defined “factorial Schur functions” (see, e.g., [Mac95, 1.3.20-21]), but with the advantage that 
they are stable —i.e., s^(xi,..., Xd, 0) = s* (xi,..., x^). They arise for us because they can 
sometimes be used to express the ratio of two Schur functions (see the “Binomial Formula” 
Theorem 4.6). To analyze them, we will use the following “shifted analogue” of Theorem 2.20, 
proved in [0098b, Theorem 8.1], [IKOl, Theorem 9.1] (see also [MellOb, p.25]): 

Theorem 2.33. For ^ \- k, let us think of the central character polynomial pl^ not as an 
observable of Young diagrams (applied to Xi,...,Xd) but as a shifted symmetric polynomial 
in indeterminates xi,... ,Xd- In the context of Fourier analysis over the group G = &k, for 
each fixed x G we may think of p^^ s^{x) := vr i—)• pt{x) as a class function. Then its Fourier 
coefficients are given by 

= s*(x). 

(Note that give the determinantal definition of the shifted Schur polynomials, one may alter¬ 
natively take this Theorem as a definition of the shifted symmetric polynomials p^(x).) 

As mentioned, the pfj, polynomials are especially important for us as because there is a simple 
expression for their expectation under any Schur-Weyl distribution. This is the subject of our next 
proposition. 

Proposition 2.34. Let p be a dxd density matrix with eigenvalues rji,...,and let p\- k. Then 

. E ^(^)] = ■ Puim, ■ ■ ■, Vd)- 

A~SWp ^ 

Proof. It’s immediate from the definitions that both sides are 0 if n < A:, so we assume n > k. 
Applying Proposition 2.23 and the definition of pjj we obtain 

E >;.(A)] = «(.(!, ....Vd)- xxip U 1”-") 

A~SW" ^ ^^ 

P Ahn 

where the second equation is from Theorem 2.20. But ■ • • >%) = Pu^Pii ■ ■ ■;%); since 

the two quantities differ only by factors of pi(7yi,..., rjd) = ?/i + ••• + % = 1- D 

Note that in the case of r/i = ... = = l/d, we have that p^(r/i,..., pd) = . This gives 

us the following important corollary: 

Corollary 2.35. Let p,\- k. Then E [Pu(A)] = n'*'*’ • 

A~SW2' 
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2.8.1 Working with the pf,, polynomials 

As we will be working heavily with the pfj, polynomials, let us describe them further. We begin 
with the simpler case of the polynomials. Let us see how these polynomials can be written in 
terms of the polynomials. From [Was81, III.6] (cf. [1002, Proposition 3.3]) we have the following 
identity using generating functions: 




pI = Or 11(1 - 0 - 5)«) ■ e*P E 0(1 - (1 - 


-r 


i=i 




One may rewrite this (cf. [1002, (3.3)]) as 


1 ^ ( _ 1 

pI = [‘*+‘1 •! -r 11(1 - (J - 5)1) ■ E 

j=l i=0 


(7) 


where 


Qt(t) = E= i(o)*-->t+i{?)t”-V;+K?)'=”'"j'3+'"+i:L-i)'=A- («) 


m=l 


It follows that in (7) we may restrict the sum on i to the range between 0 and and in (8) we 
can restrict the sum on m to the range between 1 and k. We thereby obtain a relatively simple 
Unitary method for expressing p|,’s polynomials in terms of Pj’s. In particular, we can deduce 

Pi = P*i, pI = P2> pI=P*3- + !pi> P4=pI- ^pIpI + tP2- (9) 

As observed in [1002, Proposition 3.4], we can also deduce that in general, 

Pk=Pk~k |polynomial in pi,... ,p%_i of gradation at most k — l|, (10) 

where gradation refers to the canonical grading in which Hi Pa gradation |A|. We can of course 
inductively invert this relationship, deducing that 

p*k= p\ + |polynomial in p\,... ,p\_i of gradation at most k — l|. (11) 

For example. 

Pi = P2 = pL P3 = Ps + i(Pi)^ - iPi> P4 = pI + ‘^pIpI - tP2- (12) 

Recall that the more general p| polynomials (for r G Par) are known to linearly generate the 
algebra of observables. This means that any product P/(ipft 2 can be converted to a linear combi¬ 
nation of pi’s. In particular, if we applied this conversion in (12) we would get linear expressions 
for the “low-degree moments of Young diagrams” (i.e., the P^’s) in terms of pi’s; we could then 
compute the expectation of these, under any Schur-Weyl distribution, using Proposition 2.34. 

We are therefore interested in the structure constants of A* in the basis {p|}; i.e., the 

numbers such that 

A A = fT tt 

Z-^ Ju\U2nT- 
rSPar 

These were first determined by Ivanov and Kerov [IKOl] in terms of the algebra of partial permu¬ 
tations. We quote the following formulation from [1002, Proposition 4.5]: 
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Proposition 2.36. Let r, fii, ^2 £ Par- Fix a set R of eardinality |t| and a permutation w : R ^ R 
of cycle type r. Then 

ST _ T 

•IU1U2 ~ ^ 

'^T 

where ihe number of quadruples {Ri,wi,R 2 ,W 2 ) such that: 

1. R\ C R^ R 2 C Rj Ri U i?2 = Rj 

2. \Ri\ = \yLi\ and Wi : Ri ^ Ri is a permutation of cycle type fii, for i = 1,2; 

3. W 1 W 2 = w, where Wi : R ^ R denotes the natural extension of Wi from Ri to the whole of R. 

We present an equivalent formulation we have found to be more convenient. We omit its 
straightforward combinatorial deduction from Proposition 2.36. 

Corollary 2.37. Let 

_ nW- _ 

nr-2 ■ _ ^2)!(ri + r2 - t)! 

if the positive integers ri,r 2 ,t satisfy n, r 2 < t < ri + r 2 , and let := 0 otherwise. Then for 

yL\- ri, v\- r 2 , t\- t, 

fL = Ctrn • PJ’ [wiW 2 has cycle type r] , 

^ ^ Wi^W 2 

where wi is a uniformly random permutation on {1,, ri} of cycle type p,, and W 2 is a uniformly 
random permutation on {t — r 2 + I,... ,t} of cycle type v. 

As very simple examples, we can compute 

(Pl)^ = P(i,i) +Pl, P 2 P 1 = P(2,l) + 2P2> = P(2,2) + 4P3 + 2P(i,i)- (13) 

Substituting these into (9), we obtain the formulas 

* tt * tt * tliStt iltt * i5tt i-t A\ 

Pi = P2 = P2> P3 = P 3 + iP(l,l) + 4 PI’ P4 = P 4 + 4P(2,l) + iP2> (14) 

which will be useful to us later. 

Given the formula for the structure constants, it’s not hard to show that 

pfiP?/ = + |hnear combination of pj-’s with |r| < |/4 U z^||, 

where fi Li u denotes the partition formed by joining the parts of // and u and sorting them in 
nonincreasing order (i.e., m„,(pUz^) = m^(/x)+m^(z^)). In fact, we will require a stronger statement, 
based on the following notion introduced in [IKOl]: 

Definition 2.38. For a partition A G Par, its weight is defined to be wt(A) = |A| +f’(A). 

Now Sniady [Sni06, Corollary 3.8] proved: 

Proposition 2.39. = pIlUu + |linear combination of p|’s with wt(r) < wt(p) + wt(i^) — 2|. 
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3 The empirical Young diagram algorithm 

The empirical Young diagram (EYD) algorithm works as follows: 

The EYD algorithm. Given p®”: 

1. Sample A ~ SW”. 

2. Output A := (Ai/n,..., A^/n). 

This algorithm has, either implicitly or explicitly, arisen in several independent research threads. 
The first was the work of Alicki, Rudnicki, and Sadowski [ARS88], who showed that if p has 
eigenvalues pi > ■ ■ ■ > Pd, then A —)• p as n —>■ oo, and furthermore sketched a central limit theorem 
for the fluctuations. Ten years later, Keyl and Werner [KWOl] independently reproved the first part 
of this result (and showed an “error rate” for the EYD algorithm which, for any fixed d, decreases 
exponentially in n); they also explicitly suggested the EYD algorithm for spectrum estimation. 
Eurther independent work, developing the research on the “Gaussian Unitary Ensemble” nature of 
the fluctuations, was performed by Its-Tracy-Widom, Houdre and coauthors, and others [ITWOl, 
Lit08, HX13] 

3.1 The upper bound 

Following Keyl and Werner’s paper [KWOl], a short, simplified proof of correctness containing 
explicit error bounds was discovered in [HM02]. A small bug in their derivation was corrected 
by [GM06], whose Corollary 1 states: 

Theorem 3.1. Let p be a mixed state with eigenvalues pi > ■ ■ ■ > Pd- Let S be any set of partitions 
of n, and set dKL ■= d-Khi^, p) ■ Then 

Pr [A G 5] < (n + l)^(^+T/2 . g-n-dKL_ 

If we apply Theorem 3.1 with the set of partitions S' = {A h n | dTY{X,p) > e} and use Pinsker’s 
inequality, we get the following corollary: 

Corollary 3.2. Let p be a mixed state with eigenvalues pi > ... > Pd- Then 

Pr [<iTv(A,,) > e] < (n + 

A~bWp 

In particular, 0(d^/e^) ■log{d/e) •log(l/(5) samples are suffieient to output an estimate A satisfying 
dTviX,p) < e with probability at least 1 — 6. 

This means that any unitarily invariant property of mixed states is testable with 0{d^ je^) Tog(d/e) 
copies. 

We now give a simplified proof of Theorem 3.1. This will largely follow the outline of the 
proof found in [HM02, CM06], except we will reinterpret their majorizing step in light of the RSK 
algorithm. 

Proof of Theorem 3.1. Define the probability distribution P = (r/i,..., r/^). For a fixed parti¬ 
tion A G S, Remark 2.24 shows that upper-bounding SWp(A) is equivalent to upper-bounding 
Pra~ [RSK(a) = A]. By Proposition 2.15, RSK(a) = A only if A majorizes c{a). 
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By Remark 2.27, there are exactly dim(p;^) • dim(q^) words a G [d]” for which RSK(a) = A. By 
the majorizing step, the probability that such an a is drawn from is 

i i 

From this point on, the rest of the argument is as in [HM02, CM06]. Recall the well-known upper 
bounds (cf. [Chr06, Equations (1.21) and (1.22)]) 

dim(p;,) < =^, dim(ql) < (n + 

Thus, we can upper-bound [RSK(a) = A] by 

+ iy(d-l)/2 . ^ i)d(d-l)/2 . 

To recover Theorem 3.1, we now union bound over all A G S', of which there are at most (n-|-l)'^. □ 

3.2 The lower bound 

Our main result of this section is that Corollary 3.2 is nearly tight, even when p is the maximally 
mixed state. In particular, we show the following lower bound: 

Theorem 3.3. There is a 6 > 0 such that for sufficiently small values of e, 

Pr [dTv('^, Unifrf) > e] > 5 
A~SW”^ ^ 


unless n = Q{df /e"^). 

We will split the lower bound into two cases. 

Theorem 3.4. For every constant C > 0, there are constants 6,e > 0 such that 

Pr [dTv('^, Unifrf) > e] > 5 
A~SW5^ ‘ 

when n < Cd? and d is sufficiently large. 

Theorem 3.5. There are absolute constants C > 0 and 0 < d < 1 such that 

Pr [dTv(A, Unif(i) > e] > 5 
A~SW5 

when n > Cd?, unless n = 0(d^/e^). 

To prove Theorem 3.3, let C and di be the constants in Theorem 3.5. Apply Theorem 3.4 with 
the value of C, and let 82 and eo be the resulting constants. Set 5 := minjdi, (52}. Then we see that 
for all e < eo, 

Pr [(iTv(A, Unifrf)) > e] > (5 
unless n = Fl{d?/e^), giving Theorem 3.3. 

Theorem 3.4 might appear somewhat superfluous, as Theorem 3.5 already proves the lower 
bound for sufficiently large values of n (i.e., n > Cd?), and intuitively having fewer copies of p 
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shouldn’t improve the performance of the EYD algorithm. However, this intuition, though it may 
be true in some approximate sense, is false in general: there are regimes of state estimation where 
the performance of the EYD algorithm does not increase monotonically with the value of n. For 
example, if n is a multiple of d, then when A ~ SWJ^, A will equal Unif(i with some nonzero 
probability. On the other hand, a random A ~ will never be uniform, because n + 1 is not a 

multiple of d. Thus, decreasing the value of re can sometimes help (according to some performance 
metrics), and this shows why we need Theorem 3.4 to supplement Theorem 3.5. 

The proof of Theorem 3.4 is quite technical, and we defer it to Section 7. Our proof of Theo¬ 
rem 3.5 is simpler and appears below. It is a good illustration of the basic technique of using polyno¬ 
mial functions on Young diagrams. The intuition behind the proof is as follows: By the (traceless) 
Gaussian Unitary Ensemble fluctuations predicted in [ITWOl], we expect that for A ~ SW)), the 
empirical distribution A will deviate from Unif^ by roughly Q{l/y/n) in each coordinate. This will 
yield total variation distance Q{d/^/n), necessitating re > D(d^/e^) to achieve (iTv(Aj Unif^) < e. 
Actually analyzing the precise rate of convergence to Gaussian fluctuations in terms of re is difficult, 
and is overkill anyway; instead, we use the Fourth Moment Method to lower bound the fluctuations. 

Proof of Theorem 3.5. Our goal is to show that for re > with 1% probability over a random 

A ~ SW(), at least coordinates i G [d] satisfy 


When this event occurs. 


n ^ y/n 

d - iooo' 


c^tv(A, Unifrf) 



2=1 


re 



i=l 


^ 1 d 1 y/n 1 d 

“ 2 ’ ^ ’ re ’ 1000 “ 400000 ’ 


which is bigger than e unless re = D(d^/e^). Showing this will prove Theorem 3.5 with the param¬ 
eters C = 10^° and 5 = .01. 

To begin, let us define a family of polynomials. 

Definition 3.6. Given A; > 1 and c G R, we define ^(A) := — i — c)^ — {—i — c)^. 

This generalizes the definition of the polynomials, as p* i = P%- 

K, o 


Fact 3.7. Let c G R. Then 

• P2,c = (-2c - l)Pi +P2, and 

• pI,c = (-4c^ - 6c^ - 4c- l)p} -h + Qc + T)p\ + (-6c - 3)p^^^^^ (-4c - 2)p^ -h 4p^2,i) +p\- 

Proof. By explicit computation, one can check that 

P* 2 ,c = 2(-c- 4)p* -Hp^, = 4(-c- 4)^pi -h6(-c- 4)^P2 -h4(-c- l)p*3+pl. 

(Indeed, it’s not hard to show that in general, p\f. = (j)(—'^'he claim now 

follows from (14). □ 
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For any c, these formulas allow us to compute the expected value of P 2 ^ and ^ over a random 
A ~ SW^, by using Corollary 2.35. Furthermore, for any k and d, ^ constant 

which doesn’t depend on A. Combining these two facts allows us to compute average value over a 
random A ~ SW^ of — i — c)^, for fc = 2,4. In particular, we are interested in computing 

this expectation when c = ^. Write Li := Xi — i — ^. Then 


E 

A-SW" 




-2 = 1 


= -^ + nd+f+ f+ ^ > -^+nd> 5^, 


(15) 


where in the last step we used the fact that n/d < nd/4, because d>2. 
Similarly, as n > > d?, we can use the bound 

d 


E 

A-SW” 




_ 2=1 


= 2n-4r,-^-^ + 2nd^ + ^ + <^ + ^ + ^ + nd^ + 2n^d + nd - ^ ^ 


30 d4 d4 

<2n + 2nd^ + f + f + ^ + 'f+nd3 + 2n^d + nd + ^ 

< Gn'^d, 

where in the last step we have used only trivial bounds involving the facts that n> d^ and d> 2. 
For a fixed A, let T(A) ;= {i G [d] | |Lj| > 5y/n}. Then 


E 

A~SW" 


E 

<— E 

25n A~sw" 

1 

W 

1_ 

<— E 
25n A~sw" 

_1 

_iez:(A) 

a 

J.£C{\) 

a 

_2=1 


< 


nd 


Thus, by (15), 


Now define 


E 

A-SW” 


E 


ye[(i]\Z:(A) 

M{X) :={ie [d] 


^ nd 

“ T' 


n 


— <|Li|<5VS|, 

and let S be the event that |A1(A)| > d/200. We claim that p = Pr)^] > 1/100. This is because if 
p < 1/100, then 


E 


E 


< p ■ 25nd + (1 — p) 


f 25nd 


■I 


200 


+ 1 - 


1 


nd A nd 


2 ’ 


_ie[(i]\£(A) 
which is a contradiction. 

Now let us use the assumption that n > 10^*^d^. Consider any coordinate i G [d] satisfying 


1^*1 = 


\ • ^ 

Xi-i- - 

d 


> 


n 


200 ’ 


By our assumption that n > 10^*^d^, this implies that 


A- 

d 


> 


n 


1000 


As a result, when £ holds, which happens with at least 1% probability, there are at least ^ 
coordinates i G [d] such that 


Ai- 


> 


n 


1000 ' 


This completes the proof. 


□ 
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4 A quantum Paninski theorem 

In this section, we prove Theorem 1.10, that Q{d/e^) copies are necessary and sufficient to test 
whether or not a given state p G is the maximally mixed state, i.e., has spectrum ..., 


4.1 The upper bound 

The upper bound for Theorem 1.10 will follow from our analysis of the following simple algorithm. 
Mixedness Tester. Given where p is d-dimensional: 

1. Sample A ~ SW”. 

2. Accept if p\{X) < ^1 + . Reject otherwise. 

We remark that the tester Childs et al. [CHW07] used to distinguish the maximally mixed states 
of dimension ^ and d also depended only on the magnitude of p\{X) = 2ci(A); see [CHW07, 
equations (49), (50)]. 

Theorem 4.1. The Mixedness Tester ean test whether a state p G is the maximally mixed 

state using n = 0{dle‘^) copies of p. 

Proof. We will run the Mixedness Tester with n = lOOd/e^. Both the “completeness” and the 
“soundness” analysis will require the last identity from (13), namely 

(pif = P(2,2) + 4^3 + • (16) 


Completeness. Suppose first that p is the maximally mixed state, so that in fact A ~ SW)). We 
compute the mean and variance of p\{X) using (16) and Corollary 2.35: 


Var 

A~SW" 


pIw 


_ 

A-SW” 




A~SW2 d 

A~SW" / d^ 


Thus by Chebyshev’s inequality, 


(17) 

(18) 


Pr 

A~SW2 


pI{\) > 



n{n — 1) 
d 


“ n(n — l)e^ “ 3’ 


by our choice of n. Thus indeed when p is the maximally mixed state, the Mixedness Tester accepts 
with probability at least 2/3. 


Soundness. Suppose now that p is a density matrix whose spectrum rj = (pi,... ,pd) satisfies 
(p, Unifrf ) > e. Writing pi = ^ + Ai, this means that 



■Ei^< 


2=1 


1 

< 

“ 2 


\ 




2 = 1 
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using Cauchy-Schwarz; hence 




> 


i=l 


4e^ 

T 


(19) 


Using (16) and Proposition 2.34, we can calculate the difference between the mean of p^i^) and 
the cutoff used by the Mixedness Tester as 


E 

A-SW" 


pIW 


n{n — 1) 
d 


1 + ^) =n(n-l) (l + ^ 


V2=l 
^ d 


= n[n — 


> 


1) 1 

n{n — 1) 


^2=1 

d 


2 d 




2=1 


where the last line follows from (19). Similarly, we can calculate the variance of p^i^) as 


Var 


A~SW;j L 


pIW 


= nin-l) i^2 + 4ni^7]f - 
< n{n-l) (s + An ['^pf - 


= n{n -l)y8 + An Q ^ ^ Af - A) 

< n{n — 1) ^8 + 8n A' 


Applying Chebyshev’s inequality gives us 

P\2)W < { ^ + T 


Pr 


A~SW” L 


e" \ n(n — 1) 


d 


< 


1 


< 


4 16 

+ 


32 + 32n ( ^ A^ 


^ 2 = 1 


n^{f?ldf n{(^ldy 


where the second step follows from (19). By our choice of n, this is at most 1/3. Thus, when p is 
e-far from the maximally mixed state, the Mixedness Tester rejects with probability at least 2/3, 
as required. □ 

4.2 The lower bound: overview 

For almost all of the lower bound proof we will assume d is even. In the end we will indicate how 
to obtain the lower bound when d is odd. For 0 < e < g, let denote the probability distribution 
on [d] in which 

P5(i) = 

This is essentially the same probability distribution that Paninski [Pan08] studies in his lower 
bound. As usual, we also identify with the diagonal density matrix having these entries; i.e., 


Pd = diag 


l + 2e l-2e 1 + 2e 1 - 2e 


l + 2e l-2e 


34 





















Note that = e. We also remark that when e = 5 , the distribution is the uniform 

distribution on | elements (the odd-numbered ones). As in [Pan08], it proves to be most convenient 
to study the chi-squared distance between SWpe and SWJ); our main theorem is the following: 

d 

Theorem 4.2. (i^ 2 (SWPe, SW(J) < exp((4ne^/(i)^) — 1. 

Since this distance is small unless n = Q{d/e‘^), our lower bound is complete. More precisely: 

Corollary 4.3. For even d, testing whether a d-dimensional mixed state p has the the property of 
being the maximally mixed requires n > .15d/e^ copies. 


Proof. In light of Lemma 2.22 we know that any e-tester may as well make its testing decision 
based on a draw A ~ SWp. Since Unif^) = e, the tester must be able to distinguish a draw 

from SWpe and a draw from SWp with probability advantage 1/3; this is possible if and only if 
dTv(SWPe" SW^) > 1/3. But 

dTv(SWPe,SWP) < ^^d^ 2 (SWPe,SWP) < ^v'exp((4ne7d)2) - 1 < 1/3. 

if n < .15d/e^. □ 

We remark that by taking e = ^ we exactly recover the lower bound from Theorem 1.9 due to 
Childs et al. [CHW07]. 

There are two major steps in the proof of Theorem 4.2. The first major step will be proving 
the following formula: 

Theorem 4.4. Let x G satisfy xi -|- • • • -b = 0 . Then 


E 

A-SW 2 


/ sa (1 + xi,. 

■, 1 + a^d) 7 ^ 

V 'Sa(1,- 

..1) )\ 


^ . l\^,\ 

/iGPar 

0<i{fi)<d 


(The sum has only finitely many terms since 71^1 = 0 when \p\ > n.) 


Once the above theorem is established, the following consequence is essentially immediate: 

Corollary 4.5. Let x G satisfy xi x^ = 0 and Xi > —1 for all i. We write Qx for the 

probability distribution on [d] in which i has probability Then 


G>(swa,.sw;)= 

/iGPar 

0<i{fL)<d 


Proof. By definition, d^ 2 {SWQ^,SW'^) is equal to 


E 

A-SW" 


/SW|^ 

V sw 3 (A) 



E 

A-SW 2 





where we used Proposition 2.23. In turn, this equals the quantity on the left in Theorem 4.4 after 
canceling the common factor of d“l'^ldimA in the fraction (recall the homogeneity of the Schur 
polynomials). □ 
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Let us sketch the intuition of the proof once Theorem 4.4 is established. We are ultimately 
interested in the case x = 2e • c, where e > 0 is thought of as “small” and c G satisfies 
Cl + • • • + Crf = 0; specifically, c = c± '■= (+1, —1, +1, —1,..., +1, —1). For simplicity, let us write 
e instead of 2e. Since is homogeneous of degree |/r|, this means For the 

sake of intuition, let us consider the summands in Theorem 4.4 when |/r| = A: is “small”; i.e., the 
coefficients on For k = 1 we have only fi = (1), and the associated summand actually drops 
out: this is because S(i)(x) = xi + ■ ■ ■ + Xd = 0- For k > 2, the term is asymptotically 
and the denominator is asymptotically It remains to analyze s^(c±). This is the 

second major step in the proof of Theorem 4.2: in Section 4.4 we establish an exact formula for it. 
Naively one might expect |s^(c±)| to scale like d^ when \fj,\ = k] however, as we will see it scales 
only like (and will in fact be 0 whenever k is odd). Thus the summands with \^\ = k small 
scale asymptotically as whence we get that , SW^) is small if n <C 


4.3 Proof of Theorem 4.4 


To analyze the quantity in Theorem 4.4 we will require the so-called Binomial Formula. (It gener¬ 
alizes the “usual” Binomial Formula, viz. (1 -|- xY = X]m>o ™ ^ ~ 

Theorem 4.6. The following polynomial identity holds: 


■SA(l + a:i,...,l-FXrf) 
sa(1,...,1) 


E 

/iGPar 




)(A). 


(The sum is actually finite since we may include the restriction /U C A due to the factor s*^{\).) 

In this form with the shifted Schur polynomials, the result appears in Okounkov and Olshanski’s 
work [0098b, Theorem 5.1] (see also [0098a]). In a form involving factorial Schur polynomials it 
dates back to Lascoux [Las78]; see [Mac95, Example 1.3.10]. 

The ^ = 0 summand in Theorem 4.6 is always equal to 1; it follows that the quantity on the 
left of Theorem 4.4 is 


E 


E 


Suix) 

dt/^ 


4(A) 


E 


S^{x)Sy{x) 


yO<£{ii)<d 

Therefore proving Theorem 4.4 reduces to proving 

s^{x)su{x 


0<£{i^),£{u)<d 


dl^dl^ A~SW" 


E K(AW(A)] . 


Xi -I- \-Xd = 0 


E 


0<£(r.),£{i')<d 


df^^df’^ A~SW 


E [4W<(^)]= E 


0<£(li)<d 


dl^ • dl^l 


• n 


IImI 


( 20 ) 


This is the main difficult step of the proof; the surprising aspect here is that we only get a con¬ 
tribution on the order of from the terms with |/i| = k, whereas naively one would expect 
Showing that the contributions “drop out” is the essence of the proof. 

In aid of proving (20), it’s tempting to guess that E[s* (A)s* (A)] = l{fj,=u} ' ^ however 

such a statement is false. Instead, what is true is the following: 

Theorem 4.7. Let x G R'’* satisfy xi-l-- • -Txd = 0 and let p, G Par satisfy \p\ = ri and 0 < i{p) < d. 
Assume r 2 >ri. Then 


s,y{x) ^ 

^ dl*^ A~SW2 
W\=r2 

£{v)<d 


[s;;(A)<(A)] = 


. n^\u\ 

d\^^\ 
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To deduce (20) from Theorem 4.7, simply write 


E 


s^,{x)sy{x) 


E 

A-SW^ 


[s;(A)s;(A)] 


E E E 

r\^r2>^ \fi\=ri \v\=r2 
£{p)<d 


Stl{x)Sy{x) 


E 

A-SW 2 




Then use Theorem 4.7 when r 2 > ri and use it with the roles of /i and v reversed when r 2 < ri. 

As for the proof of Theorem 4.7 itself, the first step is to compute the expected product of 
the shifted Schur polynomials. One possible approach for this might be to use the Littlewood- 
Richardson rule for factorial Schur functions (see [MS99, Proposition 4.2] or [Mol09, Corollary 3.3]) 
to write s* s* as a linear combination of s* polynomials. Unfortunately, these Littlewood-Richardson 
coefficients seem somewhat difficult to work with. Instead, we will expand the shifted Schur polyno¬ 
mials in terms of the central characters and then multiply them via the known structure constants. 
We do this in the below lemma, carried out for a generic Schur-Weyl distribution. In this lemma, 
©(i?) denotes the symmetric group acting on the finite set R. 

Lemma 4.8. Let q = {qi,..., qd) be a probability distribution on [d] and let pL\- ri, v\- r 2 . Then 


ri+r 2 

E [4(A)s:(A)] = y 

A~SW" L MV ^ /j ^ 

^ t=riVr2 


r\r2 


n 


4,i 


AXui'^i)Xu{w2)pwiW2iQ)] ■ 

'W2''^&{R2) 


Here, for each choice oft, we let Ri,i ?2 denote (arbitrary but fixed) subsets of[t] having cardinality 
ri,r 2 , respectively, with Ri U R 2 = [t]. (E.g., Ri = {!,... ,ri}, R 2 = {t — r 2 -|- 1,... ,t}.) Also, 
Ihi denotes the extension of wi to &t formed by letting wi fix each element of [t] \ Ri; similarly 
for W 2 . 

Proof. Recall the notation p{w) from Section 2.3 used denote the cycle type of a permutation w. 
In this proof, we also use the following notation: We write p ~ to denote that p is a random 
partition of r formed by first choosing m ~ uniformly and then taking p = p{w). 

Using Theorem 2.33 for the hrst equality below, and Corollary 2.37 for the third equality, we 
have 


E [4(A)s:(A)] 

A~SW" 


= E 

A~SWJ L, 


E iXuiPi)- pLW]- E [Xuip2)-pLW] 


Pi ~6 


ri 


P 2 ~e 


’’2 


= E 

Pl~ 6 rj^ 

P 2 '~®r '2 


XuiPi)Xu{p2) • , E \pLW-PpA><) 


A~sw: 


= E 

P 2 ~ 6 r 2 


xMxAp2) ■ 


ri-|-r 2 


xt 

Xrir2 


E Ev^; 

=riVr2 rht 

where here Wi is chosen to be a uniformly random permutation on Ri (as in the lemma’s statement), 


Pr [p{wiW2) = t] ■ pI{\) 

tDl~6(ifi)|pi 

tD2'~6(i?2)|P2 
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conditioned on having cycle type Pj. By Proposition 2.34 the above equals 


E 

P 2 ~ 6 r 2 

ri+r 2 


r\+r2 


it 

^rir2 


t=riVr2 rht 


E 

t=ri'Vr2 


C: 


Xii{pi)xu{p2) ■ 

XlJ.^Pl)Xv{P2) ■ 'y ^ ^{p{w\W2)=t} ' Pri^) 


Pr [p{wiW 2 ) = t] ■ • pr{q) 

wl^e(Rl)\p-^^ 

W2^&{R2)\p2 




rir2 


E 

P 2 ~®T ’2 

'W2'^&{R2)\p2 


r\-t 


The summation on the inside here simply equals pp(wrw 2 ){^)'i ^1®° replace XniPi) with 

Xp.{wi), and similarly for Xu{P 2 )- Thus to complete the proof it remains to show that tui and W 2 
have the same distribution as in the statement of the lemma. But this is clear: if we first pick a 
random permutation of rj symbols, then take its cycle type, then set Wi to be a random permutation 
of ri symbols of this cycle type, this is the same as simply taking Wi to be a uniformly random 
permutation of r* symbols. □ 


We will also require the following Fourier-theoretic lemma: 

Lemma 4.9. For u G <3r, p h r, and d G 

Proof. Define the class function e on by 

e(p) = 

d entries 

Since Xu{w) = Xu{'w~^) because Xv is a class function, the quantity on the left in the proposition’s 
statement is 


E [Xu{w 1) = E [xuiv = {e*Xu){u) = J]]eT3^(/i)x^(u) 


W'^&r 


fihr 


Y dli^e(/x)x^(p)x^(n) = ai^e(p)x^(n) = ..., l)xAu) = 


/ihr 

the last equality being Proposition 2.11. 

We can now complete the proof of Theorem 4.7 (and therefore also Theorem 4.4): 


□ 


Proof of Theorem 4-7. We will use Lemma 4.8 in the case of SW^, i.e., g = ..., in this case, 

for r h t we have Pt{q) = We thereby obtain 

V ^ E [s;;(A)<(A)] 
dJ'^ A~SW 2 ^ ^ ^ 

W\=r2 

l{v)<d 


^ 1+^2 

p_ E 


Ec; 

t=r2 


Xni't^i) • Y 


W\=r2 

pv)<d 


Sy{x) 
df'^ l 02 ~S(i? 2 ) 


E 


'xAw2)d^^^^^^A]. (21) 
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(Here we are using the convention £(wiW 2 ) = i{p{wiW 2 )).) We now would like to analyze the 
number of cycles of wiW 2 within <St. In uJi’s cycle decomposition, there are some cycles that act 
only on elements of Ri \ i? 2 - Let’s write for the number of such cycles, and let’s define 

G &t to be wi with those cycles deleted. Thus 

i{wiW2) = £'^{wi) + i{wi • W 2 ). 

Next, let w;]’- denote the permutation obtained by deleting every element of Ri \ R 2 from the cycle 
decomposition of Though wf acts only on Ri Cl R 2 , we will view it as an element of S(i? 2 ). 
Although we don’t have ■ W 2 = • W 2 , it’s not too hard to see that 

i{w'^ ■ W2) = i{Wi ■ W2). 


Thus we obtain 

ri+r2 


it 




t=r2 


\u\=r2 

e{u)<d 


Su{x) 


E 

102~©(-R2) - 




Applying Lemma 4.9, we deduce 


ri+r2 


l-t 




t=r2 


1 

r2'. 


\v\=r2 


Notice that we may extend the summation over to include i{i') > d as well: since x has d 
coordinates, Su{x) = 0 anyway when i{i') > d by Proposition 2.12. Having done this, we replace 
Si^ix) with E^^g^^[xu{v)pv{x)], obtaining 


ri+r2 




( 21 )= E E .,J_ 


t=r2 


1 

r 2 ! 


W\=r2 




^1+^2 It 

^rir2 ^ ^ 


E 

t=r2 


r2\ d£ 


• E \pv{x) ■ ^ Xu{v)xu{wi) 

V'^&ro I- ^^ 

" W\=r 2 


We claim that the inner expectation is 0 in most cases. First, Pv{x) vanishes whenever v has a fixed 
point, since pi{x) = (Ci + • • • = 0 by assumption. Next, suppose that v has no fixed points. By 

the orthogonality relations of representation theory, the innermost sum vanishes unless v and wj^ 
are conjugate. Since wj- G S(i? 2 ) acts only on IRi n R 2 , it must have a fixed point (and therefore 
not be conjugate to v) unless i ?2 \ d?i = 0. Since r 2 > ri, this can only happen if \p\ = ri = r 2 = t. 
We conclude that the inner expectation can only be nonzero in case \p\ = ri = r 2 = t. In this case 
we have = r 2 ! and = 0 , whence 


(21) = l{r.2=ri} • 


n 


4,ri 


, E 

a’’! ■Wi'^&r 




E 


Pv{x) 


Xu{v)Xv 

H=ri 




Once again, the summation is 0 if u and wi are not conjugate; otherwise it equals Further, 

having chosen wi, the probability that v is conjugate to wi is precisely Thus these factors 

cancel and we obtain 

(21) l{r2=ri} ■ ■ LI [X/2('*^l) ' Pwiix)] l|^2=ri} ' ij. ‘ 

completing the proof. □ 
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4.4 A formula for s^(+l, —1, +1, —1,...) 

For this formula we will need to recall the notion of the 2-quotient of a partition. This definition 
essentially encodes the ways in which a partition can be tiled by dominoes. 

Definition 4.10. Given a partition fj,, a 2-hook in [/x] is a hook of length 2; i.e., a domino whose 
removal from [/x] results in a valid Young diagram. 

Definition 4.11. A partition /x is said to be balanced (or to have an empty 2-core) if [p] can be 
reduced to the empty diagram by successive removal of 2-hooks. 

Definition 4.12. Given a partition /x we write [^]even (respectively, [/x]odd) for the set of boxes 
□ G [p] with even (respectively, odd) content c(n). 

Remark 4.13. It’s obvious from Definition 4.11 that if /x h /c is balanced then |[/x]even| = k/2. In 
fact, the converse also holds (this follows from, e.g., [JK81, Theorem 2.7.41]). 



Figure 4: The Russian and Maya diagrams for /x = (6,4,4, 3, 3) h 20. The segments and pebbles 
corresponding to the 2-quotient pair are colored green and red. The dashed lines outline a 2-hook 
that could be removed; d is the square in this 2-hook with even content (namely, —2). 



Figure 5: The diagram for 2-quotient partition = (2,1) h 3. 
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Figure 6 : The diagram for 2-quotient partition = (3, 2, 2) h 7. The 1-hook square s (with 
content —1) is associated to the 2-hook in Figure 4 that contains square d. 

Definition 4.14. Let // be a partition. From the Maya diagram for [fi], form two new Maya 
diagrams by taking the two alternating sequences of pebbles. More precisely, for b G {0,1}, let 
denote the partition whose Maya diagram is formed by the pebbles at positions 2z + (—1)^^, 
z gTL. (See Figure 4, in which 6 = 0 is associated to green and 6 = 1 is associated to red.) The 
pair (//(°^,is called the 2-quotient of (See Figures 5, 6 respectively.) 

Remark 4.15. Note that when the Maya diagrams for are formed, each of the two origin 

mark positions may need to be adjusted from the former origin mark position coming from //’s 
origin mark. It is a fact (see, e.g., [RZ12, Section 2.1]) that /r is balanced if and only if neither 
origin mark position must be adjusted. 

Fact 4.16. A 2-hook in [fj] naturally corresponds to a sequence of three pebbles in [p] ’s Maya 
diagram of the form (white, black). (See the dashed domino containing the label d in Figure f.) 
In turn, this corresponds to a “1-hook” in one of ; i.e., a square on the rim whose removal 

leaves a valid Young diagram (see the square labeled s in Figure 6). Removal of the 2-hook from [p] 
corresponds to replacing the sequence (white, black) by (black, *, white). (One thinks of the 
“filled” black pebble as jumping two positions to the left, onto the “empty” white pebble.) In turn, 
this corresponds to removing the associated I-hook from either p^^^ or p^^'l. 

We will require the following lemma. It is likely to be known; however we were unable to find its 
statement in the literature. The analogous lemma for hook lengths is well known (see, e.g., [RZ12, 
Lemma 2.1.ii]). 

Lemma 4.17. Let p \- k be a balanced partition with 2-quotient (p^^\p^^'l). Then the multiset 
{c(n) : □ G G is equal to the multiset {^c(n) : □ G [/rjevenj- 

Proof. The statement is proved by induction on the deconstruction of p from 2-hooks, with the 
base case being p = %. We rely on the fact that since p is balanced, the Maya diagrams of p^^^ 
and p'A'i can be seen alternating within the Maya diagram for p, with all three origin markers 
“lining up” (see Remark 4.15). By way of induction, suppose we consider the removal of some 
2-hook D from [//]. This corresponds (see Fact 4.16) to removing a 1-hook (square) s from p^’^\ for 
some b G {0,1}. Exactly one of D’s two squares is in [/rjeven; call that square d. (See Figures 4, 6 
for illustration.) By induction, it suffices to show that ^c{d) = c(s). But this is easily seen from the 
combination of the Russian and Maya diagrams, as the content of a square is simply the horizontal 
displacement of its center. □ 

We are now ready to establish a formula for s^(-|-l, —1, -|-1, —1,...). 
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Theorem 4.18. Let ^\- k and let d be even. Then 


— 1 , + 1 , — 1 , • • •) 

'-V-" 

d entries 


0 if /i is not balanced, 

X/t(2, 2 ,..., 2) • ^ • (cTlMeven^ jf ^ balanced. 

'-V-^ k\l 

k/2 entries 


Proof. The first part of the proof relies on a formula from [RSW04, Theorem 4.3], specialized to 
the case of “t” = 2: 


•SajC+Ij —1, +1, —1, ■ • •) 

'-V-' 

d entries 



0 if ^ is not balanced, 

sgn(xM(2 ,2 ,...,2 j ) • s^(o) (1 ,1 ,^.,! ) • s^{i) ( 'l,l,y ,1 ) if h is balanced, 

k/2 entries d/2 entries d/2 entries 


where is the 2-quotient of /x. Thus it suffices to show 

. n 1 « n 1 ,, |xm( 2 , 2 ,..., 2 )| • 

'S/x(o)(l) 1) • • ■) 1) s^(i)(l> 1) ■ • •) 1) (fc/2)!-2^/2 

assuming /x is balanced. Applying Proposition 2.11, the left-hand side of (22) is 


Next, we appeal to [RZ12, formula (2.2)], which states 

X;d(2,2,...,2) =CT/,- -dim/x^o) -dim/xW, 

where cr^ G {=tl} is a certain sign. Thus to verify (22) it remains to show 

i2 1■ V2 ) 2^/2 ■ 

But this follows immediately from Lemma 4.17. 


( 22 ) 


(23) 

□ 


4.5 Wrapping up the lower bound 

In this section we complete the proof of Theorem 4.2. We begin by applying Corollary 4.5 with 
X = (-|-2e, —2e, -|-2e, —2e,...). Using Theorem 4.18 and the homogeneity of Schur polynomials, we 
obtain the following after a few manipulations; 

Theorem 4.19. For d even and 0 < e < ^, 


d^ 2 (SW^.,SWS) 


^ n^^( 2 e) 2 ^d-^ 

fc=2,4,6,... 


(tii2 E x,{2....,2)r 

fihk balanced 
0<i(fi)<d 




(]^[fA odd 


(24) 
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To estimate this quantity we will use the following very crude bound: 


Proposition 4.20. Let d G and let k be balanced, with 0 < < d. Then 


< 2 k/2 

(;;^T[/^]odd 


( 25 ) 


Proof. Fix any domino-tiling for /i. Each of the k/2 dominoes contains one cell of even content Ce 


and one cell of odd content Co, with 
to ((flMeveny('(fl[M]odd)_ 


Ce Col 


= 1. Thus each contributes a factor of 44^ < ? = 2 


d+Co — 1 


□ 


By character orthogonality relations we also have 


X^(2 ,...,2)2<^y^(2,...,2)2 = Z(2_2) = A:!!. (26) 

fihk balanced fihk 

0<i{fi)<d 

Combining (25), (26), we get that the parenthesized expression in (24) is at most 2^/^/A:!! = 
l/(fe/2)!. Using also < n^, the right-hand side of (24) is thus bounded by 


Y, n^(2e)2^d-V(fc/2)! = exp((4neVd)^) - 1, 

fc=2,4,6,... 

completing the proof of Theorem 4.2. 


We end by indicating how to obtain the testing lower bound in the case when d > 3 is odd. In 
this case we define to be ■ ■ ■, g)- This distribution has dT™(^g> Unif^) = 

> |e; since this differs from e only by a constant factor, the lower bound of Pl{d/e^) is 
not affected. Now Corollary 4.5 is applied with x = (-|-2e, —2e,..., -|-2e, —2e, 0). By stability of the 
shifted Schur polynomials we have s^(-|-l, —1,..., -|-1, —1, 0) = s^(-|-l, —1,..., -)-l, —1), where there 
are d — 1 entries in the latter. Now we get x^(2, 2,..., 2) • ^ • (d — l)tM<=ven gf Theorem 4.18, 
and we can simply upper-bound (d — 1) by d and proceed with the remainder of the proof. 


5 Hardness of distinguishing uniform distributions 

In this section, we prove Theorem 1.12, namely that 0(r^/A) copies are sufficient to distinguish 
between the cases when p’s spectrum is uniform on either r or r -|- A eigenvalues (1 < A < r), and 
that Ll{r‘^/A) copies are necessary. To be more precise, our lower bound on the number of copies n 
will be 

n > r2-0(i/iog "'U/A. (27) 


5.1 The upper bound 

The proof of the upper bound is quite similar to that of Theorem 4.1 for the Mixedness Tester. We 
employ the following tester: 

Uniform Distribution Distinguisher. Given p®”: 

1. Sample A ~ SW”. 

2. Accept if P 2 (-^) ^ 6 •= ^(^ “ 1) ■ 5 ( r + ) • hsjsct otherwise. 
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As for the analysis, from Equations (17) and (18): 


E 

-SW” 


Aw 


n{n — 1) 


and 


m 


Var 

A~SW" 


P 2 iW ^ 2n(n — 1). 


We see that the variance is the same whether m = r or m = r + A; only the expectation is different, 
and the tester’s acceptance cutoff e is precisely the midway point between the two expectations. If 
m = r, then Chebyshev’s inequality implies 


Pr 


pIW > e 


8r^(r + A)^ 32r^ 

“ n{n — l)A‘^ ~ (n —1)^A2’ 


and we have the same upper bound by Chebyshev for 


pIW < e 


when m 


r + A. This 


upper bound is at most 1/3 provided n > 4-v/6 • ^ + 1, completing the proof of the upper bound in 
Theorem 1.12. 

The end of Section 6.1 gives a different 0(r^)-copy tester (the “Rank Tester”) for the r-versus- 
(r + 1) case. In this case it’s superior to the Uniform Distribution Distinguisher in that it has 
one-sided error (i.e., it never rejects in the rank-r case). 


5.2 The lower bound 


The bulk our work for the lower bound will be devoted to the case of A = 1. The extension to 
larger A is very tedious and will be dealt with in Section 5.3. So let r G be a parameter 
which we think of as tending to infinity, and for brevity let r,. = r -|- 1. Our task is to show that 
the distributions SW/ and SW((|^ are very close in total variation distance unless n > D(r^). For 
notational convenience we will write 


n = ^ 

and seek to show that SWp and SW/^ are close once co is sufficiently large as a function of r. 
Ultimately we will select oj = exp(0(log'®^ r)). For now, though, let’s keep lo general, subjecting it 


only to the following assumption: 


200 < a; < W- 


(28) 


5.2.1 Initial approximations 

It proves more convenient to study the Kullback-Leibler divergence between SW/ and SW”^: 


dKL(SW/,SW/J 


E 

In 

( SW/[A] 

^SW/JA] 

A~SW" 





E 

In 

+ ' 

A~SW? 




= nlnf-)+ E 

V r / A~SW" 


/ nne[A](’’ + c(n)) \ 

Vnne[A](U- + c(n)) j 


(29) 


where the second equality used Proposition 2.26. (We remark that the logarithms above are always 
finite since supp(SW”) C supp(SW”^).) 
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Recalling that = r + 1, it is very easy to verify (cf. [Mac95, Exercise 1.1.11], [CGS04, Sec¬ 
tion 2.5]) that the large fraction inside the inner logarithm of (29) is equal to 


TT- 

r - (i - 1 - Ai 


ci>(-(r + i);A), 


where denotes a generating function for the modified Frobenius coordinates, defined in [1002] 
and similar to the “Frobenius function” from [LasOS, CSSTIO]. Proposition 1.2 in [1002] observes 


that 


<h(z;A) 


n 


z + b* 

* ’ 

^ - a* 


where the ot’s and 61’s are the modified Frobenius coordinates of A; as a consequence. Proposi¬ 
tion 1.4 in [1002] states that 

= (30) 

k=l 


However we cannot immediately take z = —{r + and conclude 


(29) = nln 



-|- E 

A~SW" 


E 


Lfc=i 


(-l)VUA) 

k{r + 2 )^ 


(31) 


because (30) is merely a formal identity of generating functions and does not hold for all real z. More 
specifically, it’s necessary that the Taylor series for ln(l -|- bi/z) and ln(l — ai/z) converge, which 
happens provided |6j/(r-|- 2 ) 1 ,1 Oi/(r-|- 2)1 < 1- These conditions are equivalent to £(A) = A'^ <r-|-l 
and Ai < r -|- 1. The first condition is automatic, since A ~ SW”. The second condition does 
not always hold; however, we will show (see Lemma 5.2 below) that it holds with overwhelming 
probability when n <C Indeed the “central limit theorems” for the Schur-Weyl distributions 
suggest that both Ai and A)^ will almost always be 0{^/n) = 0{^). Let us therefore make a 
definition: 


Definition 5.1. We say that A h n is usual if Ai, A'^ < ^r. Since we are assuming u > 200, usual 
A’s satisfy Ai, A'^ < < r -|- 1. 

Thus when A is usual we may apply (31). Since the quantity inside the expectation in (29) is 
clearly always negative, we may write 


dKL{SW^,SWl) 


(29) < n In 
= n In 

= n In 



-|- E 

A~SW" 


/nne[A](^ + c(n)) \ 


+ E 


1{A usual} 


00 

E 

k=l 


(-l)Vfc(A) 

k{r -|- 2 )^ 


r + 


E 

-SW! 


[1 


{A usual} 


•p 1 (a)] 


+ E 

A~SW” 


{A usual} 


■E 

k=2 


(-i)V;(A) 

k{r -|- 2 )^ 


(32) 

(33) 
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Recall that Pi(A) is simply |A|; thus the expectation in (32) is simply nPr[A usual]. As Lemma 5.2 
below shows, Pr[A usual] = 1 — (5 for J Thus: 


(32) = n In 1 + - - 


+ 


r+i r+i' 


< n 


1 l/(60r2) 


12r3 


r+ i 


< 


n 


1 


10r3 lOcu^r 


(34) 


Lemma 5.2. Let A ~ SWp. Then Pr[A unusual] < 2 ^Or/oj^ 

Proof. Write B = By Proposition 2.31 and the fact that B <r, 


Pr[Ai > S],Pr[A'i > B] < 


r\ 2 \ ^ 

2e^n 


R 2 


/ 2^2 \ 

<( — 1 < 2 -l- 20 r/a;_ 

- Vloo; 


The lemma now follows from the union bound. 
Turning to (33), let’s write 


□ 


c 


(-l)Vfc(A) 


recalling that L^(A) is definitely convergent if A is usual. The infinite sum in (33) is inconvenient, 
as is the +5 in the denominator. We clean these issues up with the following lemma: 


Lemma 5.3. Assuming \ \- n is usual, if 


C > 


31og(10r) 
log(a;/ 10 ) ’ 


it follows that 


2(11 

1l;,(a)-Lc(a)1<^, 




where Lc{\) denotes the same quantity as Lf,{X) except with no +A in the denominator. 

Proof. For any A h n (not necessarily usual), we have the crude bound lp^(A)j < 2y/nB^ whenever 
Ai,A)^ < B. This is because each modified Frobenius coordinate a* or b* (of which there are at 
most y/n each) is at most B. For usual A we may take B = —r. Thus we have 


1l;,(a)-l^(a)1 < 

k=C+l 


\PIW\ 

k{r + 


“ 25(10^)^ “ /lO 

< > , P ■ < 2 r > ' 


k=C+\ 


- 


k=C+l 


V ^ 


“ 250r2’ 


where the last inequality used the assumption about C (and the second-to-last inequality used 
u > 200 in a crude way). Further, 


c 


\LUX) - Lc{X)\ <Y1 


Ip*. (A) I 


1 


1 


k=2 


ir+\f I 


E 


2^(10^y 

CjJ y LB ^ 


\ LB^ LB ' 

k=2 


k 


k 


2 ^fc+i 


1 


UJ 


c 

E 

k=2 


10 


200 


Finally, ^ + 25 ^ ^ ^ by our assumption (28) that co < yT. 


- ^ 3 • 

LO J 


□ 
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Let us use this lemma in (33), and also apply (34) in (32). Assuming the lemma’s hypotheses, 
we obtain 


<iKL(sw" sw;y < e , [i,,, ■ Lc(a)] + 55 ^ + 


201 


A~sw: 

We can use Cauchy-Schwarz to bound 


A~sw; 

- A,,fwn [1{A unusual} ' ^c(A)] + 


202 


E^,. [1{A ■ ic(A)] < \/E|lf„„.„.„^,|VE[Lo(A)^l < 2-'»--/VE|ic(A)^l, 


(35) 


where the last inequality used Lemma 5.2. Finally, we can afford to use an extraordinarily crude 
bound on E[Lc(A)^]: 

c c 

B[Lc{Xf] < C'^E[pl{Xf] < , 


k=2 


k=2 


where the second inequality used the crude bound on |p^(A)| from the proof of Lemma 5.3. (In 
fact, in Section 5.3 we will actually show that this quantity is quite tiny.) If we now make the very 
weak assumption that C < ^ ^ , we may conclude (35) < <C 

Now we can summarize all of the preparatory work we have done so far: 

Proposition 5.4. Assuming <C< ^ ^ , for X ~ SW" we have 

dKL(sw;i, sw;ij < e [Lc{x)] + 


where 


c 


Lcw ■= 


(-l)V.(A) 

krk 


(36) 


k=2 

(It is straightforward to check using (28) that the range of values for C is nonempty.) 
We now come to the main task: showing that E[Lc(A)] is small. 

5.2.2 Passing to the polynomials 

In this section and the following one, we will use the notation 

fact(/x) = mw{p)\ 

it;>l 

where, recall, mw{p-) is the number of parts of p equal to w. 

The following proposition is essentially immediate from known formulas: 

Proposition 5.5. For any k G Z'*', we have the following identity on observables: 

~P^l + 


Pk = 


E 


fact(}i) 


fi : wt(/i)=fcH-l 

where Ok is an observable with wt{Ok) < k. More precisely, 

Ok = 'y ^ ^k,k^p\i 

fi : wt{fi)<k 

for some rational coefficients Ck^fj.- 
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Proof. From [1002, Corollary 2.8] we have 


p% = - -• Pk+i + |a linear combination of pk, ■ ■ ■ ,P 2 \ ■ 

From [1002, Corollary 3.7] (cf. [MellOb, Lemma 10.10]) we have 


Pk+l = 


E 


{k + 






fact(ii) 

/i : wt{fi)=k-\-l ^>1 

The result is now easily deduced from Proposition 2.39. 
Substituting the above result into (36) yields: 


c 


LcW = ^ 


c 


(- 1 )^ ^ 

kr’^ ■ ^ fact(/r) ^ ^ ‘ 

k=2 wt(/i)=/c+l 


k=2 


-l)"0fc(A) 

krk 


□ 


(37) 


Taking the expectation over A ~ SW”, and using Corollary 2.35 to evaluate the expectation of pjj, 
we obtain: 


c 


E [Lc(A)] = V 


A~sw: 


k=2 

C 

+ E 

k=2 


(-i)‘ 

krk 


E 


fact(/i) 




wt(/i)=fcH-l 

(-1)^ EA....swy [C’fc(-A)] 

krk 


(38) 

(39) 


We will show in Lemmas 5.7, 5.8 below that the “error term” (39) is small assuming n <C Thus 
we focus on (38). 


5.2.3 Showing the “main term” is small: some intnition 

Before diving into manipulations, let’s take a high-level look at the contributions to (38) from 
k = 2, 3,4, 5,..., focusing on the powers of n and r. First consider the case of A: = 2. Here there 
is only one p with wt(/i) = 3, namely p = (2), which has j^j = 2 and i(p) = 1. Thus from k = 2 
we pick up a factor on the order of ( 3 -; more precisely, This looks rather bad from the point 
of view of proving a quadratic lower bound for n: the term is not small unless n <C The 
main surprise in our proof is that this term will be exactly canceled by “lower-degree” contributions 
from larger k. 

To see an example of this, consider the k = 3 contribution in (38). Here there are two ^’s with 

3 

wt(/i) = 4, namely p = (3) and p = (1,1). The first gives a contribution on the order of more 
precisely, The second gives a contribution of — thereby precisely canceling the k = 2 

term. Thus we are left (so far) with — which is small if n <C This is still far from a 

quadratic bound, but it’s better than the bound we were faced with previously. 

In turn, the — contribution will be canceled by a certain k = 3 term, namely ^ from 
p = (2,1), together with a certain fc = 4 term, namely from p = (1,1,1). Indeed, if we sum 

up through k = 6, the total contribution is -which is small if n <C This gets us 

still closer to a quadratic bound. 
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In fact, looking carefully at small partitions suggests that perfect cancelation is achieved if we 
group contributions according to \fi\. This proves to be the case, as we will show below. In the 
end (38) does not precisely vanish because for m > C/2, not all /i’s with |/i| = m appear in (38). 
However the “leftover contributions” are of the shape r(^)^ for k > CI2, a quantity we can ensure 
is small by taking uj and C large enough. (There is a tradeoff involved preventing us from taking C 
too large; our “error bound” (39) increases with C.) 


5.2.4 Proof that the “main term” is small 

Although (38) has a double summation, the summed quantity is simply counted exactly once for 
each /i with 3 < wt(/r) < C + 1. As suggested above, let us rearrange the summation according 
to 1^1. We will use the notation s = |^| — 1 and h = £(//) — 1, so that wt(//) = s + h + 2 (i.e., 
k = s + h-\-l) and wt(^) < C + 1 h <C — 1 — s: 


C—\ min(s,C—1—s) 

(38) = E E E 


S=1 


C-1 


h=0 ^hs+l 


{s + h + l)r®+^+i fact(//) 


E(-i) 


s + l 


n 




min(s,C—1—s) 


r2s+l 




S=1 


h=0 


/ihs+l 

£(/i)=/i+l 


fact(/r) 


(We remark that we switched from r + ^ to r in Lemma 5.3 so as to obtain nice cancelations on r 
here. We also recall the convention = ^^^^.) It is not hard to show (see, e.g., [MellOa, 

Lemma 11]) that 

E 

/ihs+l 


fact(/i) {h + l)\\h 


Substituting this into the above, and also using (s + 


(^+fe)! 
(s+l)! ’ 


we get 


c-i 

(38)=^(-l)*+‘ 

S=1 


n 


^(s+l) min(s,c-l-s) 


^2s+l 


E 

h=0 


[-If 


[s + h)\ 


[s + i)\[h + iy. 


^ s + l r2*+i ^ h + l\ h )\h)' 



We now obtain the promised cancelation. Specifically, it is a known combinatorial identity (see, 
e.g., [GKP94, page 182]) that for all s G Z'*', the inner summation equals 0 provided h ranges all 
the way up to s. In other words, all contributions from s < vanish. For larger s, it’s not 
hard to bound the inner “partial sum” crudely by, say, 9® in absolute value. We therefore finally 
conclude; 


1(38)1 < E jil 

f <s<C-l 


J^t(s+ 1 ) 




(40) 
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5.2.5 Bounding the “error term” 

In this section we bound the “error term” (39), using the following lemma: 


2 

Lemma 5.6. Suppose n = ^. Then 0 < 
Proof. By Corollary 2.35, ^ pIiW 


A~SW" 


□ 


We will first use this lemma to bound (39) in a “soft” way, thinking of C as an absolute 
universal constant. This is enough to get a testing lower bound like n > for every <5 > 0. 

Subsequently we do some technical work (which the uninterested reader may skip) to get a more 
explicit lower bound. 

Lemma 5.7. For all C >2 there is a constant Ac such that |(39)| < Ac ■ 

Proof. It suffices to show that for all A: > 2 there is a constant such that 


EA~SW"[C’fc(A)] ^ 1 

JU — ■^k * 9 ■ 


But recalling Proposition 5.5, the left-hand side is 


E 


fi : -wt{{i)<k 




Aw 


and each expectation here is at most (;;j 7 )^^^ < ^ by Lemma 5.6. This completes the proof. □ 
Lemma 5.8. In fact, the constants Ac from Lemma 5.7 satisfy Ac < 

Proof. The proof involves some tedious analysis using the results of Section 2.8.1. It suffices to 
show that 

|cfc,^| (41) 

fi-.wt{fi)<k 

where, recall, the coefficients Ck^fi are defined by 

A- E S 

/i : wt(/i)=/c+l fA : wt(^)<k 


Let us return to the relationship between the p* and p'^ polynomials described in Section 2.8.1. 
Specifically, we’ll need identities (7), ( 8 ), which express each p^, as a polynomial in pi, •. ■,via 
the power series Qkit). 

Given any polynomial R in indeterminates pi,... ,pk (either p*’s or p^’s), write ||i?|| for the sum 
of the absolute values of R^s coefficients. This is a submultiplicative norm. Observe from ( 8 ) that 

||Qfc,m|| < {k + 1 )™'+^ (indeed, one may show it’s precisely ^+1 -^)- Thus the coefficient 

on P in Qkity is a polynomial in ..., of norm at most 0{ky. Hence the same is true for 
the coefficient on P in the expression Qk{ty from ( 8 ). As the coefficient on each power 

of t in ni=i(l “ (j “ 5 )^) is a number of magnitude at most {k — , we finally deduce that the 

relationship ( 10 ) can be expressed more quantitatively as 


p{ =Pk + Rk{ph---^Pk-i)^ 


where 1 -|- ||72fc|| < exp( 6 /i; log fc), b a universal constant. 
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We inductively invert this relationship as in (11), writing 

Pk = Sk{Pi, ■ ■ ■,p^), where S’*, = + |polynomial in ... ,pl._i of gradation at most A; — l|. 

(43) 

If we let s{k) = ||Sfc||, using convexity of exp(6A: log/c) we get the inductive bound 

s{k) < exp{bk log k)s{k — 1), 

leading to the bound s{k) < exp(0(/c^ log A:)). This is nearly enough to complete the proof; the 
only issue is that in (43) we have a polynomial in the p^-’s, whereas in (42) we have the products 
of p^-’s expanded out into linear combinations of p^’s. However Lemma 5.9 below, which crudely 
bounds the magnitude of the structure constants for the p^’s, shows that each monomial n.pl 
with gradation |A| = w can be replaced by a linear polynomial in p^’s (with |p| < w) wherein each 
coefficient has magnitude at most d"' Since w is always bounded by A: — 1 and since there 

are at most 20 {Vk) ^ exp(0(A:^ log A:)) partitions p with |p| < k, we conclude that each of these 
linear polynomials has norm at most exp(0(A:^ log A:)). Thus making these replacements in Sk only 
increases its norm by another multiplicative factor of exp(0(A:^ log/c)). The proof is complete. □ 

i(X) 

Lemma 5.9. Let X\- w, and suppose nd. = E C/iPjj within A*. Then |c^| < 4“'^log*" for all p. 

i=i n 

Proof. The proof is an induction on = i{\), the base case of I" = 1 being trivial. Now for general 
A with \i = k we have 

n=(n ■pI=(Yj pi d^fik. ( 44 ) 

i=l \i = l / \ fl / fl T T fl 

where each |d^| is at most iog(«)-fc) ^ induction. By Corollary 2.37, the 

structure constants satisfy |/^^| < < |p|!A:! < w^. Since the number of partitions of 

(w — k) is trivially at most w'^, the coefficient on pf- in (44) has magnitude at most 

V \d^,flk\ < ■ max|d^| < < 4 *""i°g*", 

completing the induction. □ 

5.2.6 Combining the bounds 

Combining (40), and Lemmas 5.7, 5.8, we get that under the hypotheses of Proposition 5.4, 

dKL(SW)l,SW(lJ <r +exp(0(C'2logC))-^ + ^ <exp(O(C2'0i)).^. (45) 

In the above we used ^ ^ second inequality here following from 

the assumed lower bound on C. It’s now evident that we should take C as small as we can; in 
particular, to equal |~3 1 • We conclude: 
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then 


_ 2 

Theorem 5.10. For any 200 < ui < ^/r, if n = ^ 

dKL(SW;?,SW;?+i) < exp(O((logr)/(loga;))2-0i) 

In particular, for u: = exp(0(log'®'^ r)) and hence n = above bound is 0^(1). 

By Pinsker’s inequality we may conclude also that (iTv(SW”, SW^) < 0^(1) unless n = 
j,2-0(i/log'33r) _ This completes the proof of the rank-r versus rank-(r + 1) testing lower 

bound; in particular, the more precise bound (27) in the case A = 1. 


5.3 Extension to A > 1 

Let us henceforth hx the parameter C = |~3 1 • To recap the preceding section we saw that 

|E[Lc(A)]| <exp(O(C2-0i)).^, and hence dKL(SW”, SW" i) < exp(O(C'2-0i)) • (46) 

If we apply Pinsker’s inequality to the latter bound we obtain 

dTv(SW”,SW;i+i) < exp(O(C2-0i)) . i. 

U) 

The key to getting a good lower bound when A > 1 is to show that Pinsker’s inequality is not 
sharp in our setting, and in fact the following is true: 

2 

Theorem 5.11. For any 200 < uj < y/r, if n = ^ then 

dTv(sw;i,sw;i+i) < exp(o(c2-oi)) • 4- 

UJ^ 

From this we can obtain the testing bound (27) for rank-r versus rank-(r-|-A) (where 1 < A < r) 
simply by using the triangle inequality. Specifically, given r <r' <2r and n, define by n = 
Applying Theorem 5.11 for each r', we get 

dTv(SW”/, SW”/_,_ J < exp(0((logr')/(loga;r'))^'°^) • for all r < r' < 2r. 

^r' 

But Ur' is within a factor of 2 of Ur for all r < r' < 2r] thus by adjusting the constant in the O(-), 
the above holds with Ur in place of Ur'- Applying the triangle inequality, we get 

dTv(SW(l,SW(l,+^) < exp(O((logr)/(loga;,))2-0i) • ^ • A. 

Ur 

Again, taking Ur = exp(0(log'®'^ r)), we get 


dTY{SW^,SW^,+^) < 


n 


^2-0(1/log-33 r) 

and this completes the proof of the rank-testing lower bound (27). 


•A, 


Thus it remains to prove Theorem 5.11. The main result we need for this is the following: 


Theorem 5.12. Var [Lc(A)] < exp(0(C'^'°^)) • 


1 


A~sw:i 


u‘* 
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To prove Theorem 5.12 we will employ the following lemma: 

Lemma 5.13. Let ^ be a partition with wt(^) = k >2. Then 

Var [pf,(A)] < e^p{0{k^ log k)) • 

A~b W.^ 

Proof. If |/r| = 1 then pft(A) = n which has variance 0. Thus we may assume |//| > 2 and hence 
k > 3. Using Proposition 2.39, 

Varbt(A)] = Elp^iXf] - E[p;,(A)]2 = " Ebt(A)]' + E[q^{X)] (47) 

where ^^(A) is a certain linear combination of pt polynomials, each of weight at most 2k — 2. 
Regarding the first two quantities here, Corollary 2.35 tells us that 

“ E[p^(A)]^ = n'^(2|/"l)r2^(M)-2|/i| _ ^ ^2£(/i)-2|At| j-^4,(2|/^|) _ ^2^^ 

which is evidently nonpositive. Thus it suffices to prove the upper bound 

|E[g/,(A)]| < exp{0{k‘^logk)) ■ • (l/w"^). (48) 

By Lemma 5.9, the coefficients on the pt^s in the linear combination each have magnitude at 

most exp(0(A;^ log fc)), and there are at most 20 (vT) q£ £] 2 em. Thus (48) follows provided we can 
show E[pJ,(A)] < r‘^^~‘^joj^ for all u of weight at most 2k — 2. This is immediate from Lemma 5.6 
for all V 7 ^ (1), and when u = (1) it still holds: Lemma 5.6 gives us the bound r’^juP' < r^< 
j,2fc-2y'^4^ the first inequality using oj < y/r and the second using fc > 3. □ 

We can now prove Theorem 5.12. 

Proof of Theorem 5.12. Recall identity (37): 

y- tt A(-l)'=Ofc(A) 


k=2 


wt{fi)=k+l 


k=2 


krk 


We claim that for each 2 < k < C, 


Var 


(-l)"Ofc(A) 


kr^ 


< exp(O(C2-0i)) 


UJ‘ 


A ’ 


(49) 


and that furthermore for each p, of weight A; + 1 we have 


Var 


(_l)fc fclWM)-!) ^ ' 

kr^ fact(/r) 


< exp(O(C2-0i)) . 


This is sufficient to complete the proof, as in general 

Var[Xi + ... + X^]< m(Var[Xi] + • • • + Var[X™]); 


(50) 


(51) 


in our particular case we have only m = exp(0(\/C')) summands, and this factor can be absorbed 
into the target variance bound of exp(0(C'^ °^)) • (1/w^). To verify (49), first recall that each Ofc(A) 
is a linear combination of ptiXfs for u of weight at most k < C] further, the sum of the absolute 


53 














value of the coefficients is at most exp(0(C'^'*^^)) (see (41)). Using (51) again, it therefore suffices 
to check that 


Var 


plw 

j.k 


< exp(O(C2-0i)) . ^ 


when wt(z/) < k < C. By Lemma 5.13 this is true, with a factor of r ^ to spare. 

To verify (50), we may ignore the factor and also ignore the factor as it 

contributes at most a multiplicative <C exp(0(C'^''^^)). Thus it suffices to show Var[pJ'j(A)/r^] < 
exp(0(C^'*^^))/a;^ for /r of weight k + 1 (and k < C). But this is immediate from Lemma 5.13. □ 


We now work towards the proof of Theorem 5.11. Adding Theorem 5.12 and the square of (46) 
we obtain 

E [LciXf] < exp(O(C'2-0i)) • (52) 

A~SW;? \ / J \ ^4 

We would now like to similarly claim that 




1 


Ul' 


,4 ’ 


(53) 


where we are writing 


:= E 

k=2 


(-l)Vfc(A) 

k{r + 1)^ 


To obtain this, it suffices to repeat all of the arguments beginning with Section 5.2.2 until this 
point; the only thing that really changes is that oj = ujr needs to be replaced with oJr+i-, but this 
has a negligible effect on the bounds (and indeed usually very slightly improves them). 

Next, we claim that Lemma 5.3 continues to hold if we replace Lc{\) with the analogous L)),(A). 
The key change to the proof comes in the last main inequality, where we need to observe that the 


\j-k ~ 

continues to hold if the left-hand side is replaced with 

_, 

y(r+A)fc (r-Fl)^y 

We need one more definition for the proof of Theorem 5.11. 

Definition 5.14. Say that A h n is usuat^ if it is usual and if furthermore |L^(A)| < 2. 
Lemma 5.15. Both for A ~ SWp and A ~ SW)?^ it holds that 

Pr[A not usual^] < exp(0(C^'°^)) • 

Proof. For A ~ SW(1, Lemma 5.2 tells us that 

Pr[A not usual] < 2-20"/‘" < 2-^^^ < exp(O(C2-0i)) • ^ 


54 











and it’s easy to check that this is also true with plenty of room to spare for A ~ SWJ^. Thus it 
suffices to verify for both distributions on A that the probability of |L^(A)| < 2 satisfies the same 
upper bound. By applying Markov’s inequality to (52), (53) we get 

Pr JLc(A) 2 > 1], Pr [L+c(A) 2 > 1] < exp(O(C2-0i)) • 


A~sw: 


A~sw: 




Finally, when A is usual and \Lc{^)‘^\ ^ 1, it follows that necessarily |L^(A)| < 2, in light of 
Lemma 5.3 and the fact that ^ < 1. As noted earlier, the r+-analogue of Lemma 5.3 holds, and 
hence we may draw the same conclusion concerning L^(A)^. □ 

Finally we are ready to complete the proof of Theorem 5.11. We begin with 

dTv(SW)l, SW” ) < - Pr [A not usual"''] + - Pr [A not usual"'"] 

+ 2 A~SW" 2 A~SW^ 


I E |sw;i^[A]-SW]l[A] 


+ 2 

usual+ A 

We can bound the first two terms above using Lemma 5.15. Indeed there is room to spare, as the 
bound we get is the square of what we can tolerate. Thus it remains to bound the third term by 
exp(0(C'^-°^)) • -K. For it we use 


E |sw;i^[A]-SW]1[A] 

usuaA A 


= E 

A~SW”, 


= E 

A~SW" 


{A usual+} 


1 - 


SWp[A] 

SW^iX] 


^{A usual+} • |1 -exp(u(A))| 


where 


u{X) = In 


SWP[A] 

sw;i^[A] 


= n In ( 1 + - I — 
r 


n 


r+ i 


T + 


(54) 


(55) 


the last equality holding from (31) (see also the sentence after (33)) under the assumption that A 
is usual (which we can indeed assume, since we’re multiplying against l{Ausuai+})- we noted 
after (33), the first two quantities in (55) sum to a positive quantity not exceeding Fur¬ 

thermore, because of the presence of the usual'''-indicator in (54) we may assume in analyzing (55) 
that |L^(A)| < 2. Thus we may use the bound m(A) < 2 + ^ < 2.01. Since |1 — exp(u)| < 4|u| for 
u G [—2.01,2.01], we may conclude that 

1{A usual+} ' \ 77 ?. + 


(54) <4 E 

A~SW" 




Thus to complete the proof of Theorem 5.11 it remains to show 


E 

A~SW: 


|L;,(A)|]<exp(O(C2-0i))—. 


1 






By the r+-analogue of Lemma 5.3, it suffices to prove this with T^(A) in place of L^(A), because 
201/a;^ <C exp(0(C'^-°^))/a;^. But finally 


E [|T+(A)|] < 

A~SW" ^ 


E [L+(A)2] < exp(O(C'2-0i)) 
A~swr?, L / j 




using Cauchy-Schwarz and (53). The proof of Theorem 5.11 —and hence also the testing lower 
bound (27)— is therefore complete. 
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6 Quantum rank testing 


6.1 Testers with one-sided error 

In this section, we prove the first part of Theorem 1.11, that 0(r^/e) copies are necessary and 
sufficient to test whether or not a state has rank r with one-sided error. We will show this by 
analyzing the following algorithm. 

Rank Tester. Given p®"', 

1. Sample A ~ SW”. 

2. Accept if i{X) < r. Reject otherwise. 

Our primary tool in analyzing this tester will be the RSK correspondence. Suppose p’s nonzero 
eigenvalues are rj = {r/i,..., and let T> be the distribution over [d] induced by rj. By Re¬ 
mark 2.24, SWp has the same distribution as the process which first samples w ~ p®” and outputs 
A = RSK(rr). Write LDS(r(;) for the length of the longest strongly decreasing subsequence in w. 
By Theorem 2.14, £(A) = LDS(rr). 

The key property we will need of the Rank Tester is the following: 

Proposition 6.1. The Rank Tester is the optimal algorithm for testing whether or not a state has 
rank r with one-sided error. 

Proof. To show this, we need to show (i) that every A satisfying i(X) < r occurs with nonzero 
probability in SW” for some p of rank r and (ii) that no A satisfying (.{X) > r occurs in SW” for 
any p of rank r. The first follows because if p has r nonzero eigenvalues, then the word 

w := r ,... ,r, (r - 1),..., (r - 1) ,..., 1,..., 1 

\r letters A,— i letters Ai letters 

occurs in 2?®"- with nonzero probability. It is easy to check that A = RSK(rt;). 

To show that (ii) holds, if p is rank r, then p has at most r nonzero entries. Thus, any word w 
in the support of will always satisfy LDS(r(;) < r because w will contain at most r distinct 
letters. As 1{X) = LDS(t(;), we are done. □ 

As a result of Proposition 6.1, Theorem 1.11 follows from the following lemma. 

Lemma 6.2. The Rank Tester tests whether or not a state has rank r with 0(r^/e) copies. 

Proof. If p is e-far from having rank r, then p is e-far in TV distance from having support size r. 
Thus, we can show the lemma by showing the following two facts about probability distributions. 

(i) For every probability distribution V = {pi,... ,pd) which is e-far from having support size r, 
a random word w ~ satisfies LDS(ie) > r -|- 1 with probability at least 2/3 for some 
n = 0(r^/e). 

(ii) There exists an integer d and a probability distribution P = {pi,... ^pX) which is e-far from 
having support size r such that, for a random word w P®", LDS(te) < r with probability 
greater than 1/3 whenever n = o(r^/e). 
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Proof of statement (i): We will need the following concentration bound for sums of geometric 
random variables. 

Proposition 6.3 ([Bro]). Write X = Xi + ... + X^, where the Xi’s are iid geometric random 
variables with expectation g,. For any k > 1, Pr[X > knp] < exp (—iA;n(l — 1/A:)^). 

We note that Proposition 6.3 also holds with the weaker hypothesis that the W’s are independent 
(and not necessarily identically distributed), each with expectation at most /r. 

We may assume that pi > ... > Pd- We will split into two cases, handled below: (1) Pr+i > e/4r 
and (2) Pr+i < e/4r. 

(1) Because the probabilities are sorted, pi,... ,Pr+i > e/4r. For the infinite random word w ~ 

2?®“, consider the number of letters one has to traverse through before finding (r + 1), r,..., 1 
as a subsequence. This number is distributed as W = . .+Xi, where Xi is a geometric 

random variable with success probability pi . 

By assumption, pi > e/4r, and therefore EXj < 4r/e, for each i G + By Proposition 6.3, 
X is at most 24r^/e with probability at least 2/3. Thus, if n = 24r^/e, then w ~ has a 
strictly decreasing subsequence of size r + 1 with high probability. 

(2) Because the probabilities are sorted, Pr+i ,... < e/4r. Place the letters from {r +1,..., d} 

into buckets as follows: starting from letter (r + 1) and proceeding in order, add each letter 
to the current bucket until it contains at least e/4r weight. At this point, move to the next 
bucket and repeat this process starting with the current letter until all letters have been 
bucketed. 

Because these letters have weight < e/4r, each bucket has total weight in the interval 
[e/4r, e/2r) (except possibly the final bucket). There must be at least 2r + 1 buckets with 
nonzero total weight, as otherwise Pr+i + ... + Pd < e, contradicting the fact that V is e-far 
from having support size r. This gives us at least 2r > r + 1 buckets each of which contains 
at least e/4r total weight. 

Now we can use an argument similar to case (1) to show that when n = 24r^/e, a random 
w ~ will with probability >2/3 have a strictly decreasing subsequence in which the 
first letter comes from bucket r + 1, the second letter comes from bucket r, and so on (ending 
in a letter from the first bucket). This is a strictly decreasing subsequence of size r + 1. 


Proof of statement (ii): For d >> r, define the probability distribution 


V 


l-2e. 


2e 


2e 


d-l’"'’d-l 


Because d >> r, P is e-far from having support size r. For a string w G [d]"’, let w be the 
substring of w formed by deleting all occurrences of the letter “1” from w. It is easy to see that 
LDS(m) < LBS{w) < LDS(m) + 1. 

For a randomly drawn w ~ let us condition on w having a certain fixed length m. The 

value of LDS(m) is distributed as the length of the longest decreasing subsequence in a uniformly 
random word drawn from [d — I]™'. By Theorem 2.14, this is distributed as A/ for A ~ SW)/_^. 
Setting B = \100^/rn\, let us show that Pr[A'^ > B] is small. If B > d, then surely A/ < B 
always, as A ~ will always have height at most d — 1. On the other hand, if B < d, then by 

Proposition 2.31, 


Pr[A; >B]< 


2e^m\^ ^ 2e^ 

J - 10000 ’ 
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In summary, conditioned on w having a certain fixed length m, LDS('S3) < 0{^/m) with all but 
the above probability. 

In expectation, for a random w ~ w has length 2ed. By Markov’s inequality, the probabil¬ 
ity that the length of w is greater than 200ed is at most 1 /lOO. Conditioned on the length of w being 
at most 200ed, the above paragraph tells us that LDS('u3) < 0{V^) with probability 1 —2e^/10000. 
Thus, when w ~ we have with probability greater than 1/3 that LDS(m) < 0(\/ed), which 
is o(r) unless d = n(r^/e). □ 

For our last result of this section, we will show that the copy complexity of the Rank Tester can 
be improved in certain interesting cases. In particular, the Rank Tester matches the upper bound 
of the Uniform Distribution Distinguisher from Section 5 for the case of r v. r -|- 1, and does so 
with one-sided error. 

Proposition 6.4. The Rank Tester can distinguish between the case when p’s spectrum is uniform 
on either r or r -|- 1 eigenvalues with O(r^) copies of p. 

Proof. If p’s spectrum is uniform on r eigenvalues, then it is rank r and so the Rank Tester never 
rejects. Thus, we need only show that the Rank Tester rejects with probability > 2/3 when p’s 
spectrum is uniform on r -|- 1 eigenvalues for some n = O(r^). We will follow the analysis in the 
proof of statement (i) above and show that a random word w ~ p®*^ has LDS(m) = r -|- 1 with 
high probability. The gain will come from the fact that rj = (l/(r -|- 1),..., l/(r -|- 1)). 

For the infinite random word w ~ consider the number of letters one has to traverse 

through before finding (r -|- 1), r,..., 1 as a subsequence. This number is distributed as W = 
Xr+i Xi, where Xi is a geometric random variable with success probability l/(r -|- 1) and 

expectation r -|- 1. By Proposition 6.3, X is at most 6r^ with probability at least 2/3. Thus, if 
n = 6r^, then w ~ p®"- has a strictly decreasing subsequence of size r -|-1 with high probability. □ 

6.2 A lower bound for testers with two-sided error 

In this section, we prove the second part of Theorem I.ll, that D(r/e) copies are necessary to test 
whether or not a state has rank r with two-sided error. 

Proof. Let d ^ r. In this proof, we will take the viewpoint of a density matrix as a probability 
distribution over pure states. Let p and a be maximally mixed on subspaces of dimension (r — 1) 
and {d — 1), respectively. Consider the following process for generating a product state |T) = 
|Ti)(8)---(8)|T„): 

1. Let X G {0,1}2£ be a uniformly random 2e-biased string, meaning each coordinate is selected 
independently according to Pr[a;j = 1] = 2e. 

2. For each i G [n] such that Xi = 0, set iTi) := |d). 

3. Let b be an arbitrary {0, l}-bit. For each i G [n] such that Xi = 1, 

(a) if 5 = 0, then set iTj) to be a state vector sampled from p. 

(b) if 6 = I, then set iTj) to be a state vector sampled from a. 

If 6 is 0, then the mixed state output by this procedure has spectrum (I — 2e, ..., which 

is rank r. On the other hand, if 6 is 1, then the mixed state output by this procedure has spectrum 
(I — 2e, ..., -jzi)-, which because d^ r \s e-far from having rank r. 
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Let us consider the choice of x in the first step, and set wt(a;) to be the number of I’s in x. 
In expectation, wt(a;) will be 2en, and so by Markov’s inequality wt(a;) will be at most 200en 
with probability at least 99/100. There must exist an x with wt(x) < 200en conditioned on which 
the algorithm succeeds with probability at least 3/5, as otherwise it will succeed in total with 
probability at most 1/100 + 99/100 • 3/5 < 2/3. 

Fix any such x. The job of the algorithm is reduced to distinguishing between the cases when 
those |'I'i)’s for which Xi = 1 came from p which is maximally mixed on a subspace of dimension 
(r — 1) (when 6 = 0) or from a which is maximally mixed on a subspace of dimension {d — 1) (when 
b = 1). Because d ^ r, we have by Theorem 1.9 that this requires at least ll(r) copies to succeed 
with probability at least 3/5. Thus, we must have 200en > 11 (r), in which case n = ll(r/e). □ 

7 The EYD lower bound (continued) 

In this section, we prove Theorem 3.4. 

Theorem 3.4 restated. For every constant C > 0, there are constants 5, e > 0 such that 

Pr [dTv(A, Unifrf) > e] > 5 
A^SW^- 

when n < Cd? and d is sufficiently large. 

Proof. To prove Theorem 3.4, we show, at a high level, that when n < Cd"^, Biane’s law of large 
numbers kicks in and A approaches the limiting curve Qg, for 9 := Each of these curves is 
constantly far from the curve produced by the uniform partition, and the lower bound follows. 
However, carrying out this proof involves some subtle argumentation and splitting of hairs which 
we will go into. 

There is one regime where A certainly does not approach : when n is a fixed value independent 
of the value of d, then A will be always be constantly far from Qg. However, we can rule this case 
out by noting that when n is too small as a function of d, then any A = (Ai,..., A^) with n boxes 
will have most of its Xfs zero, and so A will be far from uniform. In particular, when n = o{d), then 
we have that d^vi^, Unif^) —)• 1 as d —)• oo. As a result, for sufficiently large d we can immediately 
assume that n > /(d), where /(d) is any function which is both a;d(l) and o(d). For concreteness, 
we will take /(d) := \/d. 

We are now in the regime where Biane’s law of large numbers holds. Theorem 2.30 tells us 
that if ^ ~ c for c some absolute constant, then there is some constant d(c) > 0 such that for a 
random A ~ SW)!, A is e-close (in L°° distance) to He whenever d > d(c). The main difficulty we 
have in applying Biane’s law of large numbers directly is that the function d(c) is left unspecified 
and, for example, could be wildly different even for two close values of c. This is problematic in 
our case, because for each value of d, the ratio 9 = ^ may be any real number in the interval 
[•\//(d)/d, VC], and so 9 may jump around and never converge to a fixed value c. In particular, an 
adversary could potentially choose n (and therefore 9) as a function of d cleverly so that for each d, 
we have that d < d{9), and so Biane’s law of large numbers never applies. Though seemingly 
unlikely, this possibility is not ruled out by the statements of known theorems. 

Our goal now is to show that the convergence to the limiting shapes guaranteed by Biane’s 
theorem happens at roughly the same rate for all values of 9 in our interval. First we will need a 
definition. 
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Definition 7.1. Given continual diagrams /, <7 : R —)• R, the distance between them is 

di{f,g) ■■= [ \fix)-g{x)\dx. 

J]R. 

This defines a metric on the set of continual diagrams, and it is well-defined because f{x) — g{x) = 0 
whenever |x| is sufficiently large. If A, g are both partitions of n, then di{X,Ji) = 4 • dTv(A) mA- 

We will prove the following result: 

Theorem 7.2. Let C > 0 be an absolute constant, and let f{d) : N —?• N he Wrf(l). Then for any 
constant 0 < <5 < 1 , if f{d) <n< Cd'^, then 

Pr rdi(A, > (5l <5, 

A~SW2 L ^ J 

for sufficiently large d, where 6 = ^. 

Let us now complete the argument assuming Theorem 7.2. For k > 0, define the following 
continual diagram: 

unifK(x) := < —x + 2k if x G (k — k), (56) 

|x| otherwise. 

To see how such a function arises, consider the uniform “partition” (g,..., ^) (“partition” being 
in quotation marks because ^ may not be integral). Drawing this in the French notation gives a 
rectangle of width ^ and height d whose bottom-left corner is the origin. Drawing this in the Russian 
notation and dilating by a factor of lf\/n therefore gives the curve unif 0 (x). One consequence of 
this is that if A is a partition of n, then di(A, unif^) = 4 • dTv(A) Unif^,). 

Define the function A : (0, \/C] —R-*^ by A(/t) := di(unifK, D^). When k < .3, A(k) > .5 for 
all c. This is because Dk(x) = —x for all x < —2 regardless of k, whereas unifK(x) = —x -|- 2k in 
(k — —2]. Because k < .3, 

(ii(unifK, Dk) = / |unifK(x) — Dk(x)| dx > 2 k • (^ — 2 — k) > 0.5. 

Jtr. 

Now, let us lower-bound A(k) when k > .3. Write I for the interval [.3,-v/C]. (If .3 > y/C then 
this step can be skipped.) To begin, we note that A(k) is continuous on I. By comparing (56) 
with Theorem 2.30, it is easy to see that A(k) > 0 for all k > 0. We can now apply the extreme 
value theorem, which implies that A achieves its minimum on I at some fixed point k* G I. We 
therefore have that A(k) > A(k*) > 0 for all k £ I. 

Combining the last two paragraphs, we now know that there is some value 

5 := min{0.5, A(k*)} > 0 

such that A(k) > S for all k G {t),y/C]. Crucially, 5 is an absolute constant which depends only 
on the constant C and is independent of n and d. Now, let us apply Theorem 7.2 with the values 
f{d) = '/d, C, and |. Then with probability at least 1 — f, di(A, D 51 ) < |. When this occurs, 

dTv(A, Unifd) = |di(A, unif^) > ^ {di{Lle, unifg) - di{\, Q 0 )) > 

where the second step follows from the triangle inequality, and the third step uses the fact that 
di(flg, unifg) = A(0) > <5. This proves the theorem with the parameters 1 — | and □ 

It remains to prove Theorem 7.2, and this is done in the next subsection. 
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7.1 Proof of Theorem 7.2 


Our goal is to give a rate of convergence of A to Pg which depends only on d and is independent 
of n. To do this, we will show that standard law of large numbers arguments give convergence rates 
of this form. Biane’s [BiaOl] proof of the law of large numbers for the Schur-Weyl distribution does 
not use Kerov’s algebra of observables. Instead, we will follow the proof of the law of large numbers 
(second form) for the Plancherel distribution in [1002, Theorem 5.5] and use results from [MellOa] 
to extend this proof to the Schur-Weyl distribution. We emphasize that our proof contains no ideas 
not already found in [1002, MellOa], and that our goal is just to show that proper bookkeeping 
of their arguments yields our Theorem 7.2. (Finally, we note that Meliot [MellOa] also sketches a 
proof the law of large numbers for the Schur-Weyl distribution using Kerov’s algebra of observables 
at the beginning of his Section 3.) 

Write AA(a:) := A(x) — ^^(x). Because A and Oe are both continual diagrams, we know that 
A;^ is supported (i.e., nonzero) on a finite interval. We will need a stronger property, which is 
that the width of this interval does not grow with d (or, equivalently, with n). To show this, note 
that Ax{x) is zero when both Cl0{x) = jxj and A(x) = jxj. For the first of these, we can consult 
Theorem 2.30 and see that ^le{x) = jxj outside the interval [—2,9 + 2]. On the other hand, A(x) 
does not equal jxj outside a constant-width interval for all A ~ SW[). (For example, with nonzero 
probability A = (n), in which case A(x) = jxj only outside the interval However, 

the next proposition shows that our desired property occurs with high probability. 

Proposition 7.3. With probability 1 — ^{x) / [xj only on an interval of width w = 

Proof. We will show that Ai and A'^ < (3y/n, each with probability 1 — 5/4, for some constant 
(3 which depends only on 6 (and C). The proposition will then follow from the union bound, as 
A = jxj outside the interval [—Ai/i/n, Ai/y71]. By Proposition 2.31, 


Pr[Ai > /3Vn],Pr[A/ > |3^/n] < ^ 


^ (1 + /30)e2 ^ (1 + ^VC)e2 
- /32 - /32 


This can be made less than 5/4 by choosing /? to be a sufficiently large function of C and 5. □ 

Let I' be the constant-width interval guaranteed by Proposition 7.3. Clearly, I' contains the 
point zero. Thus, if we define 

I := [-2,0 + 2] UI' 

then this is a single interval of width w = 0^(1). This motivates the following definition: 


Definition 7.4. We say that A is usual if Aa is supported on I. By the previous discussion, a 
random A is usual with probability 1 — 5/2. 

Let us condition A on it being usual, and let us suppose that di(A, H^) > 5. Then there is 
some point x G / such that jAA(x)j > Now we will use the fact that Qq and A are continual 
diagrams, which implies that they are both 1-Lipschitz, and therefore A^ is 2-Lipschitz. Then if 
we consider the subinterval /a; C / defined as Ix ■= [x — x + ^], this Lipschitz property implies 
that jAA(y)l > ^ for all y & Ix- (That Ix is contained in I follows from the fact that A^ is nonzero 
on Ix and A is usual.) We note that the width of /a, is 

Let 77 be a set of [closed intervals of width ^ which cover I. These intervals are chosen 
to have half the width of Ix, the result being that there is some interval J* ^ ff which is completely 
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contained in lx- For each interval J G let 'hj : R —)■ be a continuous function supported on 

J which satisfies f ^ j{y)dy = 1 (such functions are known to exist; e.g., bump functions). Then 


j*{y)dy 


> min|AA(y)| 

y&ix 


^j*iy)dy > 




By the Weierstrass approximation theorem, we can approximate each Tj with a polynomial 
function Tj such that for each x € I, I'k j{x) — 'k j(x)| < (Outside of /, Tj can—and will—be 
an arbitrarily bad approximator for Tj.) Because A;^ is 2-Lipschitz and A is usual, |A;^(x)| < 2w 
for all X G / and is zero everywhere else. As a result, for the interval J*, 



AA(y)^j* {y)dy 


> 



^x{y)^j*{y)dy 



^xiy) (^j*(y) 




dy 


4:W’ 


The first inequality uses the triangle inequality, and the second inequality uses crucially the fact 
that Aa is zero outside I. 

In summary, we have 


Pr 

A~SW" 


[d,{\,Qe)>6] 


< Pr 

A~SW 2 


3J€J: 



^x{y)^ j{y)dy 


_ 6 _ 

4w 



(57) 


where the <5/2 comes from the event that A is not usual. We will therefore show that f A\{y)4/ j{y)dy 
is at most J- for all J G 77 with probability at least 1 — By the union bound, it suffices to show 


that for each J G 77, 

Let m be 
we can write 


f Ax(y)4'j(y)dy < ^ with probability at least 1 - 
Let m be the maximum degree of the Tj functions, for all J G 77. Fix an interval J G 77. Then 


~ /*oo ^ ^ ^oo 

'i’j(x) = and / Ax{y)4ij{y)dy = '^af'^ x^Ax{x)dx, 

k=0 k=0 


(58) 


where the are constants. The following proposition, found in [MellOa, Lemma 7], gives a nice 
expression for the integrals on the right-hand side. 


Proposition 7.5. Let k > 1. Then 


r k^ ^2 .%+i(A) 

/ X Ax[x)dx = 

J-oc {k + l)x/n 


where gfc(A) is the quantity defined as 

Pk+iW 


%(A) := 


1^1 

E 


km 


-1 


n 


k/2+l-e 


(A: + l)n^/2 ^ (fc + 1-7)7!(7-1)! #+i-2^ 


The key fact we will use is that we can upper bound the right-hand side of Equation (58) by a 
quantity which decays with d, independent of the value of n. This is the subject of the following 
lemma. 


Lemma 7.6. The random variable 




, for A ~ SW)J, has mean 0^(1), for all f{d) < n < Cd'^. 
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Applying Proposition 7.5 and Lemma 7.6 to Equation (58), we see that 
is Od(l). We may take d large enough to make this quantity arbitrarily sma’ 
dj so that for all d > dj, this expectation is at most ■ 


f Ax(x)^j(x)dx 
1. Thus, select 
Then by Markov’s inequality, 

f AA(x)'k j{x)dx < with probability at least 1 — to be the max of dj 

over all J ^ J, then by Equation (57), [^^i(-^) ^e) A <5] < 5 so long as d> do, and we are 

done. 

Now we turn to the proof of Lemma 7.6. 


Proof of Lemma 7. 6. Define 


Xk{X) ■■= 


and 




Xk+i{X) 

{k + l)n^/2 


/L:wt(/L) = fc 


E 

£=i 


m{pi) 


piW 


ki2i 


-1 


n 


k/ 2 +l-e 


{k + 1-£)£[{£-l)\ #+1-2^- 


(59) 


Then by Proposition 5.5, qk{£^) and q\{^) differ from each other by n times an observable 0{X) 
of weight k. Thus, 


E 

A-SW" 


9fc(A) 


n 


< 


E 

A-SW!] 




n 


+ E 

A~SW" 


0{X) 


7),(^+1)/2 


By Cauchy-Schwarz, < y^EC)(A)2/n^+i. Because O has weight k, has 

weight 2k. As a result, we can use the next proposition to bound the contribution from this term 
by Orf(l). 

Proposition 7.7. Let 0{X) be an observable of weight at most 2k. Then 


E 

A~SW 


ro(A)i 


n ry^ 1 

d 


Od(l)- 


Proof. As in the proof of Lemma 5.7, this reduces to showing that E^^sw^ 
where /U is a partition of weight 2k, i.e. \fi\ + £{fi) < 2k. By Corollary 2.35, 


pfj(A)/n^+^ 


Od(l), 


E 

A-SWJ 


ri^k+l 


fik+i ^ 1^1 — fik+i ^ 1^1 d?\^^\ 


If \gi\ < k + 1, then this expression is at most 1/n, which is 0^(1) because n > f{d) = Wd(l). On 
the other hand, if \gi\>k + 1, then for all n < Cdf this expression is at most 

{Cd?t\ ^ 

{Cd?f+^ d2|/.| - (p{k+l)' 


which is Orf(l) as wt(/x) < 2k. 


□ 
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It remains to bound E |g|,(A)/-y/n| by Od{l)- First, we will show that Q'|(A) can be viewed as 
(approximately) computing the deviation of a certain random variable from its mean. To do this, 
let us compute the mean of the first term on the right-hand side of Equation (59). 


E 


Xk+iW 


A~SW2 {k + (k + l)n^/2 


1 


{k + 

1 

{k + 

k + l 


E 

/i:wt(/i)=/c+l 

L^J 

E 


{k + 


m{n) 

[k + 


L^J 

E 

i=i 


i=i 

L^J 

■E 

i=i 




E 


/i:wt(/i)=/c+l 


m{p) 


{k + I (k-i 

#+1-2^ ayi-i 


{k + 1 - £)ei{i - 1)1 n^/2 . c^fc+i-2£ ’ 


where the third equality follows from [MellOa, Lemma 11]. As a result, the difference 


E 


Xk+iiX) 


L'^J 

E 


1.121 


-1 


n 


k/ 2 +l-e 


A~SWJ (A;-I-l)n^/2 ^ (A: + 1 - f')t'!(f'- 1)! cffc+i-2^ 

can be written as a sum over terms of the form a ■ rfi/, where a is a constant coefficient, 
1 < 6 < k/2 — and 1 < I < ■ Given that n < CcP, each of these terms if ±0^(1). Thus, if 

we set 


then 


_ 

(A: -|- l)n^/2 A~SW2 (k + l)n*^/2 ’ 

+ oi(fi). 


E 

qIw 

< E 

<lk{X) 

A~SW2 

y/n 

A~SWJ 

\/n 


Finally, we show that E |gfc(A)/\/n| = 0^(1). By Cauchy-Schwarz, 

qk{X) 


E 


n 


< 


fm’’ 


so it suffices to show that E {qk{X)/\/n)‘^ = 0^(1). This expectation is simply the variance of the 
random variable Xk^i{X)/{k + which itself is a weighted sum of a constant number 

of random variables of the form p^(A)/n(^^^^/^, where wt(^) = A: + 1. An easy application of 
Cauchy-Schwarz shows that the variance of a weighted sum of a constant number of random 
variables is 0^(1) if the variance of each random variables is 0^(1). Thus, we will show that 
YaLr\pli{X )= 0 ^( 1 ) for all wt{/j,) = k + 1. 

Fix a partition ^ of weight A; -|- 1. Then 


Var 


pIW 

j^{k+l)/2 


_ 

A~SW" 


1 


n{k+l)/2 Yfk 


pIWpIW - EbJ]^ 


64 



































By Proposition 2.39, pli{X) •P/'i(A) = + C'(A), where 0(A) is an observable of weight at most 

2 • wt(p^) — 2 = 2k. Then 


Var 


7T,(^+1)/2 


E 

A~SW2 


1 

nk+i 


+ E 

A~SW2 


1 


•0(A) 


The second term is ±0^(1) by Proposition 7.7. As for the first term, Corollary 2.35, shows that it 
equals 


1 




1 

^4|/.|-2(fc+l) 




n 


fc+i 


(60) 


where we used the fact that i{p) = wt(/i) — \p\ = k + 1 — \p\. The highest-degree term of both 
and (n'^'l^l)^ is so we can write 


(60) 


1 

^4|/.|-2(fc+l) 


2\^l\-{k+2) 

a;, • 

b=-(fc+i) 


for some constants ah- When 6 < 0, n^/d‘^1^1 2 fc 2 ^ which is 0^(1) because n > /(d) = a;d(l)- 
On the other hand, when b > 0, then this term is 0^(1) because n < CcP. □ 
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